Ashwin, abstract class isn't suitable. The basic ingredient is WAL, from which the operator consumes as appropriate. We already have the WAL implementation.
Incoming tuples are added to the WAL, consumption is async. WAL offset is checkpointed. -- sent from mobile On Mar 29, 2016 2:03 PM, "Ashwin Chandra Putta" <[email protected]> wrote: > @Chinmay I went through your code and related comments on pull request. I > was originally thinking of using AbstractReconciler but seems like > AbstractAsyncProcessor will be better because 1. it has multithreaded > output 2. it does not wait for committed window to process queued tuples. > One question though, is it fault tolerant? > > @Thomas Current design creates an abstract implementation, so it needs to > be extended to create specific output writers. Need to think more about how > to make this pluggable. > > @Others I am planning to get the common functionality out to make it > reusable for outputs to any external system. > > Regards, > Ashwin. > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <[email protected]> > wrote: > > > Can this be solved through a pluggable component of operator, similar to > > window data manager? > > > > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <[email protected]> > > wrote: > > > > > +1 on making it a feature for all output operators that depend on > > external > > > system > > > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh < > > > [email protected]> > > > wrote: > > > > > > > +1 on making it apex feature. > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <[email protected]> > > wrote: > > > > > > > > > +1 for the idea, love to have this feature available for operators > > > > writing > > > > > to external systems. > > > > > > > > > > - Tushar. > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi Ashwin, > > > > > > > > > > > > This is indeed a very required functionality. > > > > > > > > > > > > I think this is applicable for all three types of operators: > Input > > > > (e.g. > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output > > > > > (JDBCOutput, > > > > > > HDFSOutput, etc) > > > > > > > > > > > > I tried to do something similar: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java > > > > > > > > > > > > But this had limitation related extending another class other > than > > > this > > > > > > abstract class etc.... > > > > > > > > > > > > From what you have mentioned above, I see the functionality > > basically > > > > > > provides asynchronous processing of tuples. > > > > > > > > > > > > Considering this is a common functionality, Can this feature > > instead > > > be > > > > > > added in platform (apex-core) itself? > > > > > > > > > > > > Thanks, > > > > > > Chinmay. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > Dear Ashwin, > > > > > > > > > > > > > > The approach sounds good. I am assuming that this will be done > > for > > > > all > > > > > > the > > > > > > > output data stores and not limited to JDBC. > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Regards, > > > > > > > Mohit > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > There are many use cases in which we are writing tuples to > > > external > > > > > > > system > > > > > > > > using JDBC etc. There are instances when the external system > > > might > > > > be > > > > > > > slow > > > > > > > > and down for some time. In those cases, the current > > > implementation > > > > of > > > > > > > jdbc > > > > > > > > output operators fail and restart until the external system > is > > up > > > > > > again. > > > > > > > > Meanwhile, the DAG is slowed down by this operator. To deal > > with > > > > such > > > > > > > > scenarios, we should write the output in a reconciled fashion > > > where > > > > > the > > > > > > > > reconciler thread is writing at the pace of external system. > We > > > > > should > > > > > > > also > > > > > > > > provide an ability to spool the data to disk when the > external > > > > system > > > > > > is > > > > > > > > down or the output operators queue is full. > > > > > > > > > > > > > > > > Here are the proposed features for the output operator. > > > > > > > > > > > > > > > > 1. Write to external system in a separate reconciler thread. > > > > > > > > 2. Queue the tuples in memory for reconciler thread to > consume. > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the > queue > > > is > > > > > > full. > > > > > > > > 4. Read from WAL and write to queue as queue is being > consumed. > > > > > > > > 5. When external system is able to consume as fast as > incoming > > > > > > > throughput, > > > > > > > > WAL is not written. The queue will just buffer the tuples > > before > > > > > > writing > > > > > > > to > > > > > > > > external system. > > > > > > > > > > > > > > > > Here is the JIRA: > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037 > > > > > > > > > > > > > > > > Please let me know if you have any feedback on the design. > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Regards, > > > > > > > > Ashwin. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Regards, > Ashwin. >
