[
https://issues.apache.org/jira/browse/CASSANDRA-7937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259967#comment-14259967
]
Piotr Kołaczkowski commented on CASSANDRA-7937:
-----------------------------------------------
I like the idea of using parallelism level as a backpressure mechanism. That
would have a nice positive effect of automatically reducing the amount of
memory used for queuing the requests.
However, my biggest concern is, that even limiting a single client to one write
at a time (window size = 1), might still be too fast, for some fast clients, if
only row sizes are big enough, particularly when writing big cells of data,
where big = hundreds of kB / single MBs per cell. Cassandra is extremely
efficient at ingesting data into memtables. If it was faster than we're able to
flush, then we still have a problem.
So I guess if, after going down to parallelism level = 1 and still being too
fast (e.g. flush queue full, last memtable almost full), we could tell the
client a message "please do not send data faster than X MB/s now" and the
client (driver) could do some artificial delay before processing the next
request.
> Apply backpressure gently when overloaded with writes
> -----------------------------------------------------
>
> Key: CASSANDRA-7937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7937
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Cassandra 2.0
> Reporter: Piotr Kołaczkowski
> Labels: performance
>
> When writing huge amounts of data into C* cluster from analytic tools like
> Hadoop or Apache Spark, we can see that often C* can't keep up with the load.
> This is because analytic tools typically write data "as fast as they can" in
> parallel, from many nodes and they are not artificially rate-limited, so C*
> is the bottleneck here. Also, increasing the number of nodes doesn't really
> help, because in a collocated setup this also increases number of
> Hadoop/Spark nodes (writers) and although possible write performance is
> higher, the problem still remains.
> We observe the following behavior:
> 1. data is ingested at an extreme fast pace into memtables and flush queue
> fills up
> 2. the available memory limit for memtables is reached and writes are no
> longer accepted
> 3. the application gets hit by "write timeout", and retries repeatedly, in
> vain
> 4. after several failed attempts to write, the job gets aborted
> Desired behaviour:
> 1. data is ingested at an extreme fast pace into memtables and flush queue
> fills up
> 2. after exceeding some memtable "fill threshold", C* applies adaptive rate
> limiting to writes - the more the buffers are filled-up, the less writes/s
> are accepted, however writes still occur within the write timeout.
> 3. thanks to slowed down data ingestion, now flush can finish before all the
> memory gets used
> Of course the details how rate limiting could be done are up for a discussion.
> It may be also worth considering putting such logic into the driver, not C*
> core, but then C* needs to expose at least the following information to the
> driver, so we could calculate the desired maximum data rate:
> 1. current amount of memory available for writes before they would completely
> block
> 2. total amount of data queued to be flushed and flush progress (amount of
> data to flush remaining for the memtable currently being flushed)
> 3. average flush write speed
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)