[ 
https://issues.apache.org/jira/browse/BEAM-11815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303639#comment-17303639
 ] 

Alexey Romanenko commented on BEAM-11815:
-----------------------------------------

Good catch, [~Safari]!

{code} 
If DynamoDB processes the number of items up to the limit while processing the 
results, it stops the operation and returns the matching values up to that 
point, and a key in LastEvaluatedKey to apply in a subsequent operation, so 
that you can pick up where you left off. Also, if the processed dataset size 
exceeds 1 MB before DynamoDB reaches this limit, it stops the operation and 
returns the matching values up to the limit, and a key in LastEvaluatedKey to 
apply in a subsequent operation to continue the operation.
{code}

So, the fix looks pretty simple on the first sight - we need to check the value 
of returned {{LastEvaluatedKey}} in the loop until everything was read. Though, 
I'm wondering if we need (and we can) to split it in advance to read in 
parallel despite the the Read is already split by {{SegmentId}} .

> fail to read more than 1M of items
> ----------------------------------
>
>                 Key: BEAM-11815
>                 URL: https://issues.apache.org/jira/browse/BEAM-11815
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-aws
>            Reporter: Mohammad
>            Priority: P1
>
> In processElement:
> {code:java}
> private static class ReadFn<T> extends DoFn<Read<T>, T> {
>   @ProcessElement
>   public void processElement(@Element Read<T> spec, OutputReceiver<T> out) {
>     AmazonDynamoDB client = spec.getAwsClientsProvider().createDynamoDB();
>     ScanRequest scanRequest = spec.getScanRequestFn().apply(null);
>     scanRequest.setSegment(spec.getSegmentId());
>     ScanResult scanResult = client.scan(scanRequest);
>     out.output(spec.getScanResultMapperFn().apply(scanResult));
>   }
> }
> {code}
>  
>  
> if total requested items > 1M dynamodb will only send upto 1M of data  and 
> expects client to read remaining items through LastEvaluatedKey (see 
> [this|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/model/ScanRequest.html#setLimit-java.lang.Integer-])
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to