Let me rephrase Ram's question to make it clear:

For an application developer using Malhar:
What are the advantages / disadvantages of using the proposed HDFS File
input Module as compared to directly using FileSplitter, BlockReader
Operators available in Malhar?

~ Yogi

On 16 February 2016 at 21:56, Munagala Ramanath <[email protected]> wrote:

> Can parallel read not be achieved by partitioning ?
>
> Ram
>
> On Tue, Feb 16, 2016 at 1:01 AM, Priyanka Gugale <[email protected]
> >
> wrote:
>
> > Hi,
> >
> > It is a common usecase to read big files on HDFS in parallel fashion i.e.
> > many reader thread are used to read the file in parallel. We can achieve
> > this on top of Apex using following Malhar operators:
> >
> > 1. AbstractFileSplitter
> > 2. AbstractBlockReader
> >
> > where FileSplitter, as per file metadata, creates small reader tasks(to
> > read file in parts). Those reader tasks are run by BlockReaders in
> parallel
> > to read the file.
> >
> > As these operators are generally used together to achieve file read
> > operation, I propose we create a module, called HDFSFileReader for this.
> >
> > Please provide your suggestions on same.
> >
> > -Priyanka
> >
>

Reply via email to