Hi Ashwin,

This is indeed a very required functionality.

I think this is applicable for all three types of operators: Input (e.g.
HDFSInput, JDBCInput, etc), Process (e.g. Enrichment), Output (JDBCOutput,
HDFSOutput, etc)

I tried to do something similar:
https://github.com/chinmaykolhatkar/incubator-apex-malhar/blob/APEXMALHAR-1963_AsyncProcessor/library/src/main/java/com/datatorrent/lib/async/AbstractAsyncProcessor.java

But this had limitation related extending another class other than this
abstract class etc....

>From what you have mentioned above, I see the functionality basically
provides asynchronous processing of tuples.

Considering this is a common functionality, Can this feature instead be
added in platform (apex-core) itself?

Thanks,
Chinmay.



On Tue, Mar 29, 2016 at 10:20 AM, Mohit Jotwani <[email protected]>
wrote:

> Dear Ashwin,
>
> The approach sounds good. I am assuming that this will be done for all the
> output data stores and not limited to JDBC.
>
> +1
>
> Regards,
> Mohit
>
> On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta <
> [email protected]> wrote:
>
> > There are many use cases in which we are writing tuples to external
> system
> > using JDBC etc. There are instances when the external system might be
> slow
> > and down for some time. In those cases, the current implementation of
> jdbc
> > output operators fail and restart until the external system is up again.
> > Meanwhile, the DAG is slowed down by this operator. To deal with such
> > scenarios, we should write the output in a reconciled fashion where the
> > reconciler thread is writing at the pace of external system. We should
> also
> > provide an ability to spool the data to disk when the external system is
> > down or the output operators queue is full.
> >
> > Here are the proposed features for the output operator.
> >
> > 1. Write to external system in a separate reconciler thread.
> > 2. Queue the tuples in memory for reconciler thread to consume.
> > 3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
> > 4. Read from WAL and write to queue as queue is being consumed.
> > 5. When external system is able to consume as fast as incoming
> throughput,
> > WAL is not written. The queue will just buffer the tuples before writing
> to
> > external system.
> >
> > Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037
> >
> > Please let me know if you have any feedback on the design.
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>

Reply via email to