Dear Ashwin, The approach sounds good. I am assuming that this will be done for all the output data stores and not limited to JDBC.
+1 Regards, Mohit On Tue, Mar 29, 2016 at 5:17 AM, Ashwin Chandra Putta < [email protected]> wrote: > There are many use cases in which we are writing tuples to external system > using JDBC etc. There are instances when the external system might be slow > and down for some time. In those cases, the current implementation of jdbc > output operators fail and restart until the external system is up again. > Meanwhile, the DAG is slowed down by this operator. To deal with such > scenarios, we should write the output in a reconciled fashion where the > reconciler thread is writing at the pace of external system. We should also > provide an ability to spool the data to disk when the external system is > down or the output operators queue is full. > > Here are the proposed features for the output operator. > > 1. Write to external system in a separate reconciler thread. > 2. Queue the tuples in memory for reconciler thread to consume. > 3. Spool the incoming tuples to hdfs using a WAL when the queue is full. > 4. Read from WAL and write to queue as queue is being consumed. > 5. When external system is able to consume as fast as incoming throughput, > WAL is not written. The queue will just buffer the tuples before writing to > external system. > > Here is the JIRA: https://issues.apache.org/jira/browse/APEXMALHAR-2037 > > Please let me know if you have any feedback on the design. > > -- > > Regards, > Ashwin. >
