What about using the split() method: https://github.com/apache/beam/blob/db60d37266c2ad6c4e2b5681221cc055d5c02eab/sdks/java/io/hadoop-input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L435
Note, its probably a good idea to read the javadoc for BoundedSource: https://github.com/apache/beam/blob/db60d37266c2ad6c4e2b5681221cc055d5c02eab/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java#L32 On Tue, Jan 30, 2018 at 7:07 PM, JangHo Seo <[email protected]> wrote: > Hello Beam dev, > > I'm working on a distributed data processing engine that supports Beam > dataflow program, > and investigating how to take split location into consideration when > scheduling 'read' task for HDFS source. > > Is there any way to get split location information from > HadoopInputFormatBoundedSource, > without using Java reflection? Since 'inputSplit' field in ' > HadoopInputFormatBoundedSource' class is > private one, I can see no way to access Hadoop split information other > than using reflection. > > Thanks. >
