@Ashwin yes, It's fault tolarant. The variable waitingTuple will get checkpointed and will hold the tuples that needs to be processed. In case of failures it re-enqueues any tuples still left in this variable.
On Wed, Mar 30, 2016 at 2:33 AM, Ashwin Chandra Putta < [email protected]> wrote: > @Chinmay I went through your code and related comments on pull request. I > was originally thinking of using AbstractReconciler but seems like > AbstractAsyncProcessor will be better because 1. it has multithreaded > output 2. it does not wait for committed window to process queued tuples. > One question though, is it fault tolerant? > > @Thomas Current design creates an abstract implementation, so it needs to > be extended to create specific output writers. Need to think more about how > to make this pluggable. > > @Others I am planning to get the common functionality out to make it > reusable for outputs to any external system. > > Regards, > Ashwin. > > On Tue, Mar 29, 2016 at 11:29 AM, Thomas Weise <[email protected]> > wrote: > > > Can this be solved through a pluggable component of operator, similar to > > window data manager? > > > > > > On Tue, Mar 29, 2016 at 11:18 AM, Siyuan Hua <[email protected]> > > wrote: > > > > > +1 on making it a feature for all output operators that depend on > > external > > > system > > > > > > On Mon, Mar 28, 2016 at 11:55 PM, Sandeep Deshmukh < > > > [email protected]> > > > wrote: > > > > > > > +1 on making it apex feature. > > > > On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <[email protected]> > > wrote: > > > > > > > > > +1 for the idea, love to have this feature available for operators > > > > writing > > > > > to external systems. > > > > > > > > > > - Tushar. > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi Ashwin, > > > > > > > > > > > > This is indeed a very required functionality. > > > > > > > > > > > > I think this is applicable for all three types of operators: > Input > > > > (e.g. > > > > > > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output > > > > > (JDBCOutput, > > > > > > HDFSOutput, etc) > > > > > > > > > > > > I tried to do something similar: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java > > > > > > > > > > > > But this had limitation related extending another class other > than > > > this > > > > > > abstract class etc.... > > > > > > > > > > > > From what you have mentioned above, I see the functionality > > basically > > > > > > provides asynchronous processing of tuples. > > > > > > > > > > > > Considering this is a common functionality, Can this feature > > instead > > > be > > > > > > added in platform (apex-core) itself? > > > > > > > > > > > > Thanks, > > > > > > Chinmay. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani < > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > > > Dear Ashwin, > > > > > > > > > > > > > > The approach sounds good. I am assuming that this will be done > > for > > > > all > > > > > > the > > > > > > > output data stores and not limited to JDBC. > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Regards, > > > > > > > Mohit > > > > > > > > > > > > > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > There are many use cases in which we are writing tuples to > > > external > > > > > > > system > > > > > > > > using JDBC etc. There are instances when the external system > > > might > > > > be > > > > > > > slow > > > > > > > > and down for some time. In those cases, the current > > > implementation > > > > of > > > > > > > jdbc > > > > > > > > output operators fail and restart until the external system > is > > up > > > > > > again. > > > > > > > > Meanwhile, the DAG is slowed down by this operator. To deal > > with > > > > such > > > > > > > > scenarios, we should write the output in a reconciled fashion > > > where > > > > > the > > > > > > > > reconciler thread is writing at the pace of external system. > We > > > > > should > > > > > > > also > > > > > > > > provide an ability to spool the data to disk when the > external > > > > system > > > > > > is > > > > > > > > down or the output operators queue is full. > > > > > > > > > > > > > > > > Here are the proposed features for the output operator. > > > > > > > > > > > > > > > > 1. Write to external system in a separate reconciler thread. > > > > > > > > 2. Queue the tuples in memory for reconciler thread to > consume. > > > > > > > > 3. Spool the incoming tuples to hdfs using a WAL when the > queue > > > is > > > > > > full. > > > > > > > > 4. Read from WAL and write to queue as queue is being > consumed. > > > > > > > > 5. When external system is able to consume as fast as > incoming > > > > > > > throughput, > > > > > > > > WAL is not written. The queue will just buffer the tuples > > before > > > > > > writing > > > > > > > to > > > > > > > > external system. > > > > > > > > > > > > > > > > Here is the JIRA: > > > > > > https://issues.apache.org/jira/browse/APEXMALHAR-2037 > > > > > > > > > > > > > > > > Please let me know if you have any feedback on the design. > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Regards, > > > > > > > > Ashwin. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Regards, > Ashwin. >
