[ 
https://issues.apache.org/jira/browse/BEAM-8376?focusedWorklogId=382045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382045
 ]

ASF GitHub Bot logged work on BEAM-8376:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Feb/20 02:52
            Start Date: 05/Feb/20 02:52
    Worklog Time Spent: 10m 
      Work Description: clement commented on issue #10187: [BEAM-8376] Initial 
version of firestore connector JavaSDK
URL: https://github.com/apache/beam/pull/10187#issuecomment-582216769
 
 
   Hi @djelekar, I work on the Firestore backend, and chiming in to second 
@fredzqm point. There are two interlocking issues when using atomic WriteBatch 
for large ingestion (throughput) jobs.
   
   First, under load and based on size, Firestore will split your dataset 
across multiple servers. When writing atomically to multiple documents, this 
increase the chance that the write will need to coordinate a 2-phase commit 
across multiple servers, which will increase the latency of the operation.
   
   Second, Firestore uses a pessimistic locking model under the hood. If the 
WriteBatch takes longer to execute (because of the issue above, or just because 
it is doing more work) it will be holding locks longer and disrupt unrelated 
read/write traffic on the document or index entries.
   
   I can see reasons why the experience looks better with WriteBatch, for 
example:
   - when using single writes, those should be asynchronous, and can (and 
should) be parallelized more aggressively
   - if the ingestion key range is not split, or not actively accessed by other 
processes, there will initially be no contention and good performance with 
WriteBatch, however there is a limit to how much throughput you will get from 
them once the ingestion runs longer and ramps up to more parallelism.
   
   Does that make sense? We are hoping to launch a dedicated feature for 
writing batches in a non-atomic fashion, but it is unclear at this point when 
this will be generally available, and as @fredzqm point out, single writes are 
the best option for now.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 382045)
    Time Spent: 2h 40m  (was: 2.5h)

> Add FirestoreIO connector to Java SDK
> -------------------------------------
>
>                 Key: BEAM-8376
>                 URL: https://issues.apache.org/jira/browse/BEAM-8376
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Stefan Djelekar
>            Priority: Major
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google 
> Cloud Platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to