Re: How to get split location from HadoopInputFormatBoundedSource

Lukasz Cwik Wed, 31 Jan 2018 11:45:07 -0800

What about using the split() method:
https://github.com/apache/beam/blob/db60d37266c2ad6c4e2b5681221cc055d5c02eab/sdks/java/io/hadoop-input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L435


Note, its probably a good idea to read the javadoc for BoundedSource:
https://github.com/apache/beam/blob/db60d37266c2ad6c4e2b5681221cc055d5c02eab/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java#L32

On Tue, Jan 30, 2018 at 7:07 PM, JangHo Seo <[email protected]> wrote:

> Hello Beam dev,
>
> I'm working on a distributed data processing engine that supports Beam
> dataflow program,
> and investigating how to take split location into consideration when
> scheduling 'read' task for HDFS source.
>
> Is there any way to get split location information from
> HadoopInputFormatBoundedSource,
> without using Java reflection? Since 'inputSplit' field in '
> HadoopInputFormatBoundedSource' class is
> private one, I can see no way to access Hadoop split information other
> than using reflection.
>
> Thanks.
>

Re: How to get split location from HadoopInputFormatBoundedSource

Reply via email to