[ 
https://issues.apache.org/jira/browse/BEAM-13009?focusedWorklogId=701426&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-701426
 ]

ASF GitHub Bot logged work on BEAM-13009:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Dec/21 07:12
            Start Date: 28/Dec/21 07:12
    Worklog Time Spent: 10m 
      Work Description: mosche commented on pull request #16367:
URL: https://github.com/apache/beam/pull/16367#issuecomment-1001905059


   R: @aromanenko-dev 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 701426)
    Time Spent: 4h 10m  (was: 4h)

> DynamoDBIO misses writing items if `withDeduplicateKeys` is not set
> -------------------------------------------------------------------
>
>                 Key: BEAM-13009
>                 URL: https://issues.apache.org/jira/browse/BEAM-13009
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-aws
>    Affects Versions: 2.27.0
>            Reporter: Lei Li
>            Assignee: Moritz Mack
>            Priority: P1
>              Labels: aws, data-loss, dynamodb
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> A new method `withDeduplicateKeys` was added in DynamoDBIO from 2.27.0. It 
> feels like it is optional according to the 
> [doc|https://beam.apache.org/releases/javadoc/2.27.0/index.html?org/apache/beam/sdk/io/aws/dynamodb/DynamoDBIO.html],
>  and it was not shown in the examples either. But if a key name not set by 
> it, [the deduplication 
> logic|https://github.com/apache/beam/pull/12583/files#diff-0b5f7a7c1ee0ec890eef82e05e08ef1152421d2c8dcef11fca107f6af0d22e87R479-R492]
>  still takes effect but uses an empty map as the `Map<String, 
> AttributeValue>` part of the deduplication key, which results in all items 
> having the same key and being deduplicated, writing only the last item to 
> DynamoDB.
> I think we need to add an check on DeduplicateKeys in 
> `extractDeduplicateKeyValues`, and skip the deduplication logic if it's empty.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to