[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Chandra Putta updated APEXMALHAR-2037:
---------------------------------------------
    Description: 
There are many use cases in which we are writing tuples to external system 
using JDBC etc. There are instances when the external system might be slow and 
down for some time. In those cases, the current implementation of jdbc output 
operators fail and restart until the external system is up again. Meanwhile, 
the DAG is slowed down by this operator. To deal with such scenarios, we should 
write the output in a reconciled fashion where the reconciler thread is writing 
at the pace of external system. We should also provide an ability to spool the 
data to disk when the external system is down or the output operators queue is 
full.

Here are the proposed features for the output operator.

1. Write to external system in a separate reconciler thread.
2. Queue the tuples in memory for reconciler thread to consume. 
3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
4. Read from WAL and write to queue as queue is being consumed.
5. When external system is able to consume as fast as incoming throughput, WAL 
is not written. The queue will just buffer the tuples before writing to 
external system.

This can be done on the output operator as a pluggable component that will 
queue the incoming tuples and provide a callback to dequeue the tuples to write 
to external system. The component will use WAL to backup the tuples when the 
queue is full.

  was:
There are many use cases in which we are writing tuples to external system 
using JDBC etc. There are instances when the external system might be slow and 
down for some time. In those cases, the current implementation of jdbc output 
operators fail and restart until the external system is up again. Meanwhile, 
the DAG is slowed down by this operator. To deal with such scenarios, we should 
write the output in a reconciled fashion where the reconciler thread is writing 
at the pace of external system. We should also provide an ability to spool the 
data to disk when the external system is down or the output operators queue is 
full.

Here are the proposed features for the output operator.

1. Write to external system in a separate reconciler thread.
2. Queue the tuples in memory for reconciler thread to consume. 
3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
4. Read from WAL and write to queue as queue is being consumed.
5. When external system is able to consume as fast as incoming throughput, WAL 
is not written. The queue will just buffer the tuples before writing to 
external system.


> Pluggable component to queue tuples with ability to spool to disk
> -----------------------------------------------------------------
>
>                 Key: APEXMALHAR-2037
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2037
>             Project: Apache Apex Malhar
>          Issue Type: New Feature
>            Reporter: Ashwin Chandra Putta
>            Assignee: Ashwin Chandra Putta
>
> There are many use cases in which we are writing tuples to external system 
> using JDBC etc. There are instances when the external system might be slow 
> and down for some time. In those cases, the current implementation of jdbc 
> output operators fail and restart until the external system is up again. 
> Meanwhile, the DAG is slowed down by this operator. To deal with such 
> scenarios, we should write the output in a reconciled fashion where the 
> reconciler thread is writing at the pace of external system. We should also 
> provide an ability to spool the data to disk when the external system is down 
> or the output operators queue is full.
> Here are the proposed features for the output operator.
> 1. Write to external system in a separate reconciler thread.
> 2. Queue the tuples in memory for reconciler thread to consume. 
> 3. Spool the incoming tuples to hdfs using a WAL when the queue is full.
> 4. Read from WAL and write to queue as queue is being consumed.
> 5. When external system is able to consume as fast as incoming throughput, 
> WAL is not written. The queue will just buffer the tuples before writing to 
> external system.
> This can be done on the output operator as a pluggable component that will 
> queue the incoming tuples and provide a callback to dequeue the tuples to 
> write to external system. The component will use WAL to backup the tuples 
> when the queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to