Eugene Kirpichov created BEAM-1309:
--------------------------------------

             Summary: FileIOChannelFactory.match() traverses entire parent 
directory recursively
                 Key: BEAM-1309
                 URL: https://issues.apache.org/jira/browse/BEAM-1309
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Pei He


I was running a pipeline that reads a single file from my local home directory.

The pipeline got stuck, and upon taking a stack snapshot, I noticed that it was 
stuck in FileIOChannelFactory.match().

The code currently works by traversing the whole parent directory of the 
requested filepattern and checking which files match the filepattern. In my 
case, that means traversing everything in my home directory, which is *a lot* 
(and includes remotely mounted directories).

https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FileIOChannelFactory.java#L109

This is very wasteful and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to