There are many use cases in which we are writing tuples to external system
using JDBC etc. There are instances when the external system might be slow
and down for some time. In those cases, the current implementation of jdbc
output operators fail and restart until the external system is up again.
Meanwhile, the DAG is slowed down by this operator. To deal with such
scenarios, we should write the output in a reconciled fashion where the
reconciler thread is writing at the pace of external system. We should also
provide an ability to spool the data to disk when the external system is
down or the output operators queue is full.

Here are the proposed features for the output operator.

1. Write to external system in a separate reconciler thread.
2. Queue the tuples in memory for reconciler thread to consume.
3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
4. Read from WAL and write to queue as queue is being consumed.
5. When external system is able to consume as fast as incoming throughput,
WAL is not written. The queue will just buffer the tuples before writing to
external system.

Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037

Please let me know if you have any feedback on the design.

-- 

Regards,
Ashwin.

Reply via email to