+1 to add this module

On Wed, Feb 17, 2016 at 9:21 AM, Priyanka Gugale <[email protected]>
wrote:

> We need partitions for parallel read but how will the reader partition know
> which offset of the file it should read from. Normally FileSplitter creates
> this metadata, let's call them as reader task, and forwards them to next
> operator which is block reader. Block reader will receive one of the tasks
> and read from specified offset in file. If FileSplitter is absent one
> reader partition will have to consume one file entirely, which means we
> can't have parallel reading over one file. I hope this answers your
> question.
>
> Advantage of having this module is having a reusable component made up of
> operators which are frequently used together to do file reading.
>
> -Priyanka
>
> On Wed, Feb 17, 2016 at 11:31 AM, Yogi Devendra <[email protected]>
> wrote:
>
> > Let me rephrase Ram's question to make it clear:
> >
> > For an application developer using Malhar:
> > What are the advantages / disadvantages of using the proposed HDFS File
> > input Module as compared to directly using FileSplitter, BlockReader
> > Operators available in Malhar?
> >
> > ~ Yogi
> >
> > On 16 February 2016 at 21:56, Munagala Ramanath <[email protected]>
> > wrote:
> >
> > > Can parallel read not be achieved by partitioning ?
> > >
> > > Ram
> > >
> > > On Tue, Feb 16, 2016 at 1:01 AM, Priyanka Gugale <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > It is a common usecase to read big files on HDFS in parallel fashion
> > i.e.
> > > > many reader thread are used to read the file in parallel. We can
> > achieve
> > > > this on top of Apex using following Malhar operators:
> > > >
> > > > 1. AbstractFileSplitter
> > > > 2. AbstractBlockReader
> > > >
> > > > where FileSplitter, as per file metadata, creates small reader
> tasks(to
> > > > read file in parts). Those reader tasks are run by BlockReaders in
> > > parallel
> > > > to read the file.
> > > >
> > > > As these operators are generally used together to achieve file read
> > > > operation, I propose we create a module, called HDFSFileReader for
> > this.
> > > >
> > > > Please provide your suggestions on same.
> > > >
> > > > -Priyanka
> > > >
> > >
> >
>

Reply via email to