[
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013748#comment-15013748
]
Martin Grotzke edited comment on CASSANDRA-6487 at 11/23/15 7:20 PM:
---------------------------------------------------------------------
[~lyubent] Can you please explain, *why* the batch size is relevant in both
szenarios 1) and 2)?
What are the extra costs of a single-partition batch (with multiple
statements/inserts), so that this warning should be logged?
How's a single-statement batch (obviously going to a single-partition)
differently handled than a single-statement not sent as BATCH?
Regarding single-partition batches, my understanding is that they don't cause
any extra costs. This understanding is based e.g. on CASSANDRA-6737 ("A batch
statements on a single partition should not create a new CF object for each
update") and on
http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html,
which says (in the paragraph "So when should you use unlogged batches?")
{quote}Well customer id is the partition key, so this will be no more
coordination work than a single insert and it can be done with a single
operation at the storage layer.{quote}
What's wrong with this understanding, in which way are single-partition batches
more expensive?
was (Author: martin.grotzke):
[~lyubent] Can you please explain, why the batch size is relevant in both
szenarios 1) and 2)?
What are the extra costs of a single-partition batch (with multiple
statements/inserts), so that this warning should be logged?
How's a single-statement batch (obviously going to a single-partition)
differently handled than a single-statement not sent as BATCH?
Regarding single-partition batches, my understanding is that they don't cause
any extra costs. This understanding is based e.g. on CASSANDRA-6737 ("A batch
statements on a single partition should not create a new CF object for each
update") and on
http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html,
which says (in the paragraph "So when should you use unlogged batches?")
{quote}Well customer id is the partition key, so this will be no more
coordination work than a single insert and it can be done with a single
operation at the storage layer.{quote}
What's wrong with this understanding, in which way are single-partition batches
more expensive?
> Log WARN on large batch sizes
> -----------------------------
>
> Key: CASSANDRA-6487
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Patrick McFadin
> Assignee: Lyuben Todorov
> Priority: Minor
> Fix For: 2.0.8, 2.1 beta2
>
> Attachments: 6487-cassandra-2.0.patch, 6487-cassandra-2.0_v2.patch
>
>
> Large batches on a coordinator can cause a lot of node stress. I propose
> adding a WARN log entry if batch sizes go beyond a configurable size. This
> will give more visibility to operators on something that can happen on the
> developer side.
> New yaml setting with 5k default.
> {{# Log WARN on any batch size exceeding this value. 5k by default.}}
> {{# Caution should be taken on increasing the size of this threshold as it
> can lead to node instability.}}
> {{batch_size_warn_threshold: 5k}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)