[ 
https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013748#comment-15013748
 ] 

Martin Grotzke edited comment on CASSANDRA-6487 at 11/23/15 7:20 PM:
---------------------------------------------------------------------

[~lyubent] Can you please explain, *why* the batch size is relevant in both 
szenarios 1) and 2)?

What are the extra costs of a single-partition batch (with multiple 
statements/inserts), so that this warning should be logged?
How's a single-statement batch (obviously going to a single-partition) 
differently handled than a single-statement not sent as BATCH?

Regarding single-partition batches, my understanding is that they don't cause 
any extra costs. This understanding is based e.g. on CASSANDRA-6737 ("A batch 
statements on a single partition should not create a new CF object for each 
update") and on 
http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html,
 which says (in the paragraph "So when should you use unlogged batches?") 
{quote}Well customer id is the partition key, so this will be no more 
coordination work than a single insert and it can be done with a single 
operation at the storage layer.{quote}
What's wrong with this understanding, in which way are single-partition batches 
more expensive?


was (Author: martin.grotzke):
[~lyubent] Can you please explain, why the batch size is relevant in both 
szenarios 1) and 2)?

What are the extra costs of a single-partition batch (with multiple 
statements/inserts), so that this warning should be logged?
How's a single-statement batch (obviously going to a single-partition) 
differently handled than a single-statement not sent as BATCH?

Regarding single-partition batches, my understanding is that they don't cause 
any extra costs. This understanding is based e.g. on CASSANDRA-6737 ("A batch 
statements on a single partition should not create a new CF object for each 
update") and on 
http://christopher-batey.blogspot.de/2015/02/cassandra-anti-pattern-misuse-of.html,
 which says (in the paragraph "So when should you use unlogged batches?") 
{quote}Well customer id is the partition key, so this will be no more 
coordination work than a single insert and it can be done with a single 
operation at the storage layer.{quote}
What's wrong with this understanding, in which way are single-partition batches 
more expensive?

> Log WARN on large batch sizes
> -----------------------------
>
>                 Key: CASSANDRA-6487
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6487
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Patrick McFadin
>            Assignee: Lyuben Todorov
>            Priority: Minor
>             Fix For: 2.0.8, 2.1 beta2
>
>         Attachments: 6487-cassandra-2.0.patch, 6487-cassandra-2.0_v2.patch
>
>
> Large batches on a coordinator can cause a lot of node stress. I propose 
> adding a WARN log entry if batch sizes go beyond a configurable size. This 
> will give more visibility to operators on something that can happen on the 
> developer side. 
> New yaml setting with 5k default.
> {{# Log WARN on any batch size exceeding this value. 5k by default.}}
> {{# Caution should be taken on increasing the size of this threshold as it 
> can lead to node instability.}}
> {{batch_size_warn_threshold: 5k}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to