Jagadish created SAMZA-1799:
-------------------------------
Summary: Support reading from DynamoDB Streams natively
Key: SAMZA-1799
URL: https://issues.apache.org/jira/browse/SAMZA-1799
Project: Samza
Issue Type: Bug
Reporter: Jagadish
Currently, Samza has built-in support to consume from AWS Kinesis, Amazon's
messaging service. There have been requests to offer native support for
"DynamoDB Streams", which is Amazon's change-capture technology for DynamoDB.
*What is DynamoDB Streams?*
DynamoDB Streams captures a time-ordered sequence of updates to a DynamoDB
table, and stores this information in a log for up to 24 hours. Use-cases
include: propagation of table updates, change capture, database replication etc.
*How does DynamoDB Streams differ from Kinesis?*
While Kinesis is a general-purpose messaging service, DynamoDB Streams is
specifically for capturing updates from DynamoDB.
*What it takes to make Samza consume from a DynamoDB change-capture stream?*
It should be possible to support change-capture from DynamoDB with minimal
effort.
As a refresher, the KinesisSystemConsumer in Samza currently creates multiple
KinesisWorkers, with each worker processing a single partition in the stream.
By default, a Worker internally uses a "KinesisProxy" to consume data from
Kinesis. We can configure it read from DynamoDB streams by simply pointing it
to use a different proxy. ie, use the _DynamoDBProxy_ instead of the default
_KinesisProxy_ when a worker is instantiated.
{code}
final Worker worker = StreamsWorkerFactory
.createDynamoDbStreamsWorker(
recordProcessorFactory,
workerConfig,
adapterClient,
amazonDynamoDB,
amazonCloudWatchClient);
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)