Re: proposal for reconciled JDBC output operator

Sandeep Deshmukh Mon, 28 Mar 2016 23:56:26 -0700

+1 on making it apex feature.
On 29-Mar-2016 12:22 pm, "Tushar Gosavi" <[email protected]> wrote:


> +1 for the idea, love to have this feature available for operators writing
> to external systems.
>
> - Tushar.
>
>
> On Tue, Mar 29, 2016 at 11:51 AM, Chinmay Kolhatkar <[email protected]>
> wrote:
>
> > Hi Ashwin,
> >
> > This is indeed a very required functionality.
> >
> > I think this is applicable for all three types of operators: Input (e.g.
> > HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output
> (JDBCOutput,
> > HDFSOutput, etc)
> >
> > I tried to do something similar:
> >
> >
> https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java
> >
> > But this had limitation related extending another class other than this
> > abstract class etc....
> >
> > From what you have mentioned above, I see the functionality basically
> > provides asynchronous processing of tuples.
> >
> > Considering this is a common functionality, Can this feature instead be
> > added in platform (apex-core) itself?
> >
> > Thanks,
> > Chinmay.
> >
> >
> >
> > On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <[email protected]>
> > wrote:
> >
> > > Dear Ashwin,
> > >
> > > The approach sounds good. I am assuming that this will be done for all
> > the
> > > output data stores and not limited to JDBC.
> > >
> > > +1
> > >
> > > Regards,
> > > Mohit
> > >
> > > On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> > > [email protected]> wrote:
> > >
> > > > There are many use cases in which we are writing tuples to external
> > > system
> > > > using JDBC etc. There are instances when the external system might be
> > > slow
> > > > and down for some time. In those cases, the current implementation of
> > > jdbc
> > > > output operators fail and restart until the external system is up
> > again.
> > > > Meanwhile, the DAG is slowed down by this operator. To deal with such
> > > > scenarios, we should write the output in a reconciled fashion where
> the
> > > > reconciler thread is writing at the pace of external system. We
> should
> > > also
> > > > provide an ability to spool the data to disk when the external system
> > is
> > > > down or the output operators queue is full.
> > > >
> > > > Here are the proposed features for the output operator.
> > > >
> > > > 1. Write to external system in a separate reconciler thread.
> > > > 2. Queue the tuples in memory for reconciler thread to consume.
> > > > 3. Spool the incoming tuples to hdfs using a WAL when the queue is
> > full.
> > > > 4. Read from WAL and write to queue as queue is being consumed.
> > > > 5. When external system is able to consume as fast as incoming
> > > throughput,
> > > > WAL is not written. The queue will just buffer the tuples before
> > writing
> > > to
> > > > external system.
> > > >
> > > > Here is the JIRA:
> > https://issues.apache.org/jira/browse/APEXMALHAR-2037
> > > >
> > > > Please let me know if you have any feedback on the design.
> > > >
> > > > --
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > >
> >
>

Re: proposal for reconciled JDBC output operator

Reply via email to