Will it make an impression on user that, if he has a batch usecase he has to use batch aware operators only? If so, is that what we expect? I am not aware of how do we implement batch scenario so this might be a basic question.
-Priyanka On Mon, Jan 16, 2017 at 12:02 PM, Bhupesh Chawda <bhup...@datatorrent.com> wrote: > Hi All, > > While design / implementation for custom control tuples is ongoing, I > thought it would be a good idea to consider its usefulness in one of the > use cases - batch applications. > > This is a proposal to adapt / extend existing operators in the Apache Apex > Malhar library so that it is easy to use them in batch use cases. > Naturally, this would be applicable for only a subset of operators like > File, JDBC and NoSQL databases. > For example, for a file based store, (say HDFS store), we could have > FileBatchInput and FileBatchOutput operators which allow easy integration > into a batch application. These operators would be extended from their > existing implementations and would be "Batch Aware", in that they may > understand the meaning of some specific control tuples that flow through > the DAG. Start batch and end batch seem to be the obvious candidates that > come to mind. On receipt of such control tuples, they may try to modify the > behavior of the operator - to reinitialize some metrics or finalize an > output file for example. > > We can discuss the potential control tuples and actions in detail, but > first I would like to understand the views of the community for this > proposal. > > ~ Bhupesh >