[ 
https://issues.apache.org/jira/browse/BEAM-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091737#comment-16091737
 ] 

ASF GitHub Bot commented on BEAM-2439:
--------------------------------------

GitHub user cph6 opened a pull request:

    https://github.com/apache/beam/pull/3585

    [BEAM-2439] Dynamic sizing of Datastore write RPCs.

    This implements the same behaviour recently added to Java SDK:
    - start at 200 entities per RPC;
    - size subsequent requests based on observed latency of previous requests.
    Includes a MovingSum class to track recent latency.
    Report RPC success & failure counts as metrics (again, as in the Java SDK).
    
    
    R: @vikkyrk
    R: @ssisk


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cph6/beam datastore_batching_py

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3585.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3585
    
----

----


> Datastore writer can fail to progress if Datastore is slow
> ----------------------------------------------------------
>
>                 Key: BEAM-2439
>                 URL: https://issues.apache.org/jira/browse/BEAM-2439
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>            Reporter: Colin Phipps
>            Assignee: Colin Phipps
>            Priority: Minor
>              Labels: datastore
>             Fix For: 2.1.0
>
>
> When writing to Datastore, Beam groups writes into large batches (usually 500 
> entities per write, the maximum permitted by the API). If these writes are 
> slow to commit on the serving side, the request may time out before all of 
> the entities are written.
> When this happens, it loses any progress that has been made on those entities 
> (the connector uses non-transactional writes, so some entities might have 
> been written, but partial results are not returned to the connector so it has 
> to assume that all entities need rewriting). It will retry the write with the 
> same set of entities, which may time out in the same way repeatedly. This can 
> be influenced by factors on the Datastore serving side, some of which are 
> transient (hotspots) but some of which are not.
> We (Datastore) are developing a fix for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to