Any suggestions/ comments on this?

~ Yogi

On 3 March 2016 at 17:44, Yogi Devendra <[email protected]> wrote:

> Hi,
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in
> the malhar library.
>
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
>
> These methods are kept generic to give flexibility to the app developers.
> But, someone who is new to apex; would look for ready-made implementation
> instead of extending Abstract implementation.
>
> Thus, I am proposing to add concrete operator HDFSOutputOperator to
> malhar. Aim of this operator would be to serve the purpose of ready to use
> operator for most frequent use-cases.
>
> Here are my key observations on most frequent use-cases:
>
> ------------------------------------------------------------------------------
>
> 1. Writing tuples of type byte[] or String.
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline
> character) between tuples.
>
> Please mention your comments regarding :
> --------------------------------------------------------
>
> 1. Will it be useful to have such concrete operator?
>
> 2. Do you think of any other datatype other than byte[], String that
> should be supported out of the box by this concrete operator?
> Currently, I am planning to include byte[], String, any other type having
> valid toString() as input tuples.
>
> 3. Do you think tuple separator should be configurable?
>
> 4. Any other feedback?
>
>
> Proposed design:
> ----------------------
>
> 1. This concrete implementation will be extending
> AbstractFileOutputOperator with default implementation for abstract methods
> mentioned above.
>
> 2. Filename , Tuple separator will be exposed as a operator property.
>
> 3. All incoming tuples will be written to same file mentioned in the
> property.
>
> 4. This operator will be added to malhar library under package
> com.datatorrent.lib.io.fs where AbstractFileOutputOperator resides.
>
> ~ Yogi
>

Reply via email to