Jagadish created SAMZA-1799:
-------------------------------

             Summary: Support reading from DynamoDB Streams natively
                 Key: SAMZA-1799
                 URL: https://issues.apache.org/jira/browse/SAMZA-1799
             Project: Samza
          Issue Type: Bug
            Reporter: Jagadish


Currently, Samza has built-in support to consume from AWS Kinesis, Amazon's 
messaging service. There have been requests to offer native support for 
"DynamoDB Streams", which is Amazon's change-capture technology for DynamoDB.

*What is DynamoDB Streams?*
DynamoDB Streams captures a time-ordered sequence of updates to a DynamoDB 
table, and stores this information in a log for up to 24 hours. Use-cases 
include: propagation of table updates, change capture, database replication etc.

*How does DynamoDB Streams differ from Kinesis?*
While Kinesis is a general-purpose messaging service, DynamoDB Streams is 
specifically for capturing updates from DynamoDB. 

*What it takes to make Samza consume from a DynamoDB change-capture stream?*

It should be possible to support change-capture from DynamoDB with minimal 
effort. 

As a refresher, the KinesisSystemConsumer in Samza currently creates multiple 
KinesisWorkers, with each worker processing a single partition in the stream. 
By default, a Worker internally uses a "KinesisProxy" to consume data from 
Kinesis. We can configure it read from DynamoDB streams by simply pointing it 
to use a different proxy. ie, use the _DynamoDBProxy_ instead of the default 
_KinesisProxy_ when a worker is instantiated. 

{code}
       final Worker worker = StreamsWorkerFactory
           .createDynamoDbStreamsWorker(
               recordProcessorFactory,
               workerConfig,
               adapterClient,
               amazonDynamoDB,
               amazonCloudWatchClient);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to