[
https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90878&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90878
]
ASF GitHub Bot logged work on BEAM-4049:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Apr/18 16:40
Start Date: 13/Apr/18 16:40
Worklog Time Spent: 10m
Work Description: adejanovski commented on a change in pull request
#5112: [BEAM-4049] Improve CassandraIO write throughput by performing async
queries
URL: https://github.com/apache/beam/pull/5112#discussion_r181445060
##########
File path:
sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java
##########
@@ -350,19 +353,21 @@ public TokenRange(
* Writer storing an entity into Apache Cassandra database.
*/
protected class WriterImpl<T> implements Writer<T> {
-
+ private static final int CONCURRENT_ASYNC_QUERIES = 100;
Review comment:
Not really, because it doesn't relate to the number of nodes or vnodes, it's
more of a best practice to prevent nodes from being overwhelmed. The capacity
of the nodes to handle a lot of concurrent queries will depend partly on the
number of threads in the read thread pool of Cassandra, which is a
configuration element we cannot access from the client.
I'm not sure users should be dealing with this as concurrency can already be
handled by limiting the number of splits/workers.
We can make this configurable but it could be confusing to users and
wouldn't bring improvements in throughput IMHO.
I'd be in favor of leaving this as a constant, but if you want to make it
configurable I'll do it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 90878)
Time Spent: 1h 10m (was: 1h)
> Improve write throughput of CassandraIO
> ---------------------------------------
>
> Key: BEAM-4049
> URL: https://issues.apache.org/jira/browse/BEAM-4049
> Project: Beam
> Issue Type: Improvement
> Components: io-java-cassandra
> Affects Versions: 2.4.0
> Reporter: Alexander Dejanovski
> Assignee: Jean-Baptiste Onofré
> Priority: Major
> Labels: performance
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous
> fashion.
> This implies that writes are serialized and is a very suboptimal way of
> writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait
> for completion each time 100 queries are in flight, in order to avoid
> overwhelming clusters.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)