+1 for the idea, love to have this feature available for operators writing to external systems.
- Tushar. On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <[email protected]> wrote: > Hi Ashwin, > > This is indeed a very required functionality. > > I think this is applicable for all three types of operators: Input (e.g. > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output (JDBCOutput, > HDFSOutput, etc) > > I tried to do something similar: > > https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java > > But this had limitation related extending another class other than this > abstract class etc.... > > From what you have mentioned above, I see the functionality basically > provides asynchronous processing of tuples. > > Considering this is a common functionality, Can this feature instead be > added in platform (apex-core) itself? > > Thanks, > Chinmay. > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <[email protected]> > wrote: > > > Dear Ashwin, > > > > The approach sounds good. I am assuming that this will be done for all > the > > output data stores and not limited to JDBC. > > > > +1 > > > > Regards, > > Mohit > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta < > > [email protected]> wrote: > > > > > There are many use cases in which we are writing tuples to external > > system > > > using JDBC etc. There are instances when the external system might be > > slow > > > and down for some time. In those cases, the current implementation of > > jdbc > > > output operators fail and restart until the external system is up > again. > > > Meanwhile, the DAG is slowed down by this operator. To deal with such > > > scenarios, we should write the output in a reconciled fashion where the > > > reconciler thread is writing at the pace of external system. We should > > also > > > provide an ability to spool the data to disk when the external system > is > > > down or the output operators queue is full. > > > > > > Here are the proposed features for the output operator. > > > > > > 1. Write to external system in a separate reconciler thread. > > > 2. Queue the tuples in memory for reconciler thread to consume. > > > 3. Spool the incoming tuples to hdfs using a WAL when the queue is > full. > > > 4. Read from WAL and write to queue as queue is being consumed. > > > 5. When external system is able to consume as fast as incoming > > throughput, > > > WAL is not written. The queue will just buffer the tuples before > writing > > to > > > external system. > > > > > > Here is the JIRA: > https://issues.apache.org/jira/browse/APEXMALHAR-2037 > > > > > > Please let me know if you have any feedback on the design. > > > > > > -- > > > > > > Regards, > > > Ashwin. > > > > > >
