What about using the split() method:
https://github.com/apache/beam/blob/db60d37266c2ad6c4e2b5681221cc055d5c02eab/sdks/java/io/hadoop-input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L435

Note, its probably a good idea to read the javadoc for BoundedSource:
https://github.com/apache/beam/blob/db60d37266c2ad6c4e2b5681221cc055d5c02eab/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java#L32

On Tue, Jan 30, 2018 at 7:07 PM, JangHo Seo <[email protected]> wrote:

> Hello Beam dev,
>
> I'm working on a distributed data processing engine that supports Beam
> dataflow program,
> and investigating how to take split location into consideration when
> scheduling 'read' task for HDFS source.
>
> Is there any way to get split location information from
> HadoopInputFormatBoundedSource,
> without using Java reflection? Since 'inputSplit' field in '
> HadoopInputFormatBoundedSource' class is
> private one, I can see no way to access Hadoop split information other
> than using reflection.
>
> Thanks.
>

Reply via email to