Any suggestions/ comments on this? ~ Yogi
On 3 March 2016 at 17:44, Yogi Devendra <[email protected]> wrote: > Hi, > > Currently, for writing to HDFS file we have AbstractFileOutputOperator in > the malhar library. > > It has following abstract methods : > 1. protected abstract String getFileName(INPUT tuple) > 2. protected abstract byte[] getBytesForTuple(INPUT tuple) > > These methods are kept generic to give flexibility to the app developers. > But, someone who is new to apex; would look for ready-made implementation > instead of extending Abstract implementation. > > Thus, I am proposing to add concrete operator HDFSOutputOperator to > malhar. Aim of this operator would be to serve the purpose of ready to use > operator for most frequent use-cases. > > Here are my key observations on most frequent use-cases: > > ------------------------------------------------------------------------------ > > 1. Writing tuples of type byte[] or String. > 2. All tuples on a particular stream land up in the same output file. > 3. App developer may want to add some custom tuple separator (e.g. newline > character) between tuples. > > Please mention your comments regarding : > -------------------------------------------------------- > > 1. Will it be useful to have such concrete operator? > > 2. Do you think of any other datatype other than byte[], String that > should be supported out of the box by this concrete operator? > Currently, I am planning to include byte[], String, any other type having > valid toString() as input tuples. > > 3. Do you think tuple separator should be configurable? > > 4. Any other feedback? > > > Proposed design: > ---------------------- > > 1. This concrete implementation will be extending > AbstractFileOutputOperator with default implementation for abstract methods > mentioned above. > > 2. Filename , Tuple separator will be exposed as a operator property. > > 3. All incoming tuples will be written to same file mentioned in the > property. > > 4. This operator will be added to malhar library under package > com.datatorrent.lib.io.fs where AbstractFileOutputOperator resides. > > ~ Yogi >
