[
https://issues.apache.org/jira/browse/BEAM-8376?focusedWorklogId=382045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382045
]
ASF GitHub Bot logged work on BEAM-8376:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Feb/20 02:52
Start Date: 05/Feb/20 02:52
Worklog Time Spent: 10m
Work Description: clement commented on issue #10187: [BEAM-8376] Initial
version of firestore connector JavaSDK
URL: https://github.com/apache/beam/pull/10187#issuecomment-582216769
Hi @djelekar, I work on the Firestore backend, and chiming in to second
@fredzqm point. There are two interlocking issues when using atomic WriteBatch
for large ingestion (throughput) jobs.
First, under load and based on size, Firestore will split your dataset
across multiple servers. When writing atomically to multiple documents, this
increase the chance that the write will need to coordinate a 2-phase commit
across multiple servers, which will increase the latency of the operation.
Second, Firestore uses a pessimistic locking model under the hood. If the
WriteBatch takes longer to execute (because of the issue above, or just because
it is doing more work) it will be holding locks longer and disrupt unrelated
read/write traffic on the document or index entries.
I can see reasons why the experience looks better with WriteBatch, for
example:
- when using single writes, those should be asynchronous, and can (and
should) be parallelized more aggressively
- if the ingestion key range is not split, or not actively accessed by other
processes, there will initially be no contention and good performance with
WriteBatch, however there is a limit to how much throughput you will get from
them once the ingestion runs longer and ramps up to more parallelism.
Does that make sense? We are hoping to launch a dedicated feature for
writing batches in a non-atomic fashion, but it is unclear at this point when
this will be generally available, and as @fredzqm point out, single writes are
the best option for now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 382045)
Time Spent: 2h 40m (was: 2.5h)
> Add FirestoreIO connector to Java SDK
> -------------------------------------
>
> Key: BEAM-8376
> URL: https://issues.apache.org/jira/browse/BEAM-8376
> Project: Beam
> Issue Type: New Feature
> Components: io-java-gcp
> Reporter: Stefan Djelekar
> Priority: Major
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google
> Cloud Platform.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)