[ https://issues.apache.org/jira/browse/BEAM-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15679482#comment-15679482 ]
Joshua Fox commented on BEAM-991: --------------------------------- The maximum request size is 10 Mb; the maximum Item size is 1 Mb. The implementation _must_ support all legal items. Solutions - Set the maximum batch size to 10. That obviously reduces performance, but allows requests to complete. - Make the users set a constant batch size between 10 and the max for Datastore-API, which is 500. This is problematic, since we do not always know how big out Items are, particularly if we are developing generic solutions. - Start with batch size 500. If that fails on a "too-large" error, the implementation then recursively cuts the batch size in half and retries, until the _put_ succeeds. This new value is then used for a while. On the assumptions that Entities are grouped into similar sizes, occasionally ramp up the batch size to see if the Entities are smaller, again reverting to smaller batch size if there is a failure. Perhaps save the batch size, and ramp it up and down, on a per-Kind basis. - Measure _getSerializedSize()_ of _all_ Items on _every put_, and adjust batch size accordingly. This may be slow. > DatastoreIO Write should flush early for large batches > ------------------------------------------------------ > > Key: BEAM-991 > URL: https://issues.apache.org/jira/browse/BEAM-991 > Project: Beam > Issue Type: Bug > Components: sdk-java-gcp > Reporter: Vikas Kedigehalli > Assignee: Vikas Kedigehalli > > If entities are large (avg size > 20KB) then the a single batched write (500 > entities) would exceed the Datastore size limit of a single request (10MB) > from https://cloud.google.com/datastore/docs/concepts/limits. > First reported in: > http://stackoverflow.com/questions/40156400/why-does-dataflow-erratically-fail-in-datastore-access -- This message was sent by Atlassian JIRA (v6.3.4#6332)