[
https://issues.apache.org/jira/browse/CAMEL-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrea Cosentino resolved CAMEL-16594.
--------------------------------------
Resolution: Fixed
> DynamoDB stream updates are missed when there are more than one active shards
> -----------------------------------------------------------------------------
>
> Key: CAMEL-16594
> URL: https://issues.apache.org/jira/browse/CAMEL-16594
> Project: Camel
> Issue Type: Bug
> Components: camel-aws
> Reporter: Pierre-Yves Bigourdan
> Assignee: Andrea Cosentino
> Priority: Major
> Fix For: 3.12.0
>
> Attachments: shards.json
>
>
> The current Camel ddbstream implementation seems to incorrectly apply the
> concept of {{ShardIteratorType}} to the list of shards forming a DynamoDB
> stream rather than each shard individually.
> According to the [AWS
> documentation|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_GetShardIterator.html#DDB-streams_GetShardIterator-request-ShardIteratorType]:
> {noformat}
> ShardIteratorType determines how the shard iterator is used to start reading
> stream records from the shard.
> {noformat}
> For example, for a given shard, when {{ShardIteratorType}} equal to
> {{LATEST}}, the AWS SDK will read the most recent data in that particular
> shard. However, when {{ShardIteratorType}} equal to {{LATEST}}, Camel will
> additionally use {{ShardIteratorType}} to determine which shard it considers
> amongst all the available ones in the stream:
> https://github.com/apache/camel/blob/6119fdc379db343030bd25b191ab88bbec34d6b6/components/camel-aws/camel-aws2-ddb/src/main/java/org/apache/camel/component/aws2/ddbstream/ShardIteratorHandler.java#L132
> If my understanding is correct, shards in DynamoDB are modelled as a tree,
> with the child leaf nodes being the shards that are still active, i.e. the
> ones where new stream data will appear. These child shards will have a
> {{StartingSequenceNumber}}, but no {{EndingSequenceNumber}}.
> The most common case is to have a single shard, or a single branch of parent
> and child nodes:
> {noformat}
> Shard0
> |
> Shard1
> {noformat}
> In the above case, new data will be added to {{Shard1}}, and the Camel
> implementation which looks only at the last shard when {{ShardIteratorType}}
> is equal to {{LATEST}}, will be correct.
> However, the tree can also look like this (see related example in the
> attached JSON output from the AWS CLI, where the shard number matches the
> index in the JSON list):
> {noformat}
> Shard0
> / \
> Shard1 Shard2
> / \ / \
> Shard3 Shard4 Shard5 Shard6
> {noformat}
> In this case, Camel will only consider Shard6, even though new data may be
> added to any of Shard3, Shard4, Shard5 or Shard6. This leads to updates being
> missed.
> As far as I can tell, DynamoDB will split into multiple shards depending on
> the number of table partitions, which will either grow for a table with huge
> amounts of data, or when an exiting table with provisioned capacity is
> migrated to on-demand provisioning.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)