[ 
https://issues.apache.org/jira/browse/CASSANDRA-19413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szymon Miezal updated CASSANDRA-19413:
--------------------------------------
    Description: 
{color:#1f2328}The size of a mutation does not consider the primary key size. 
In the context of BATCHed mutations, this means that INSERTs, DELETEs, and 
UPDATEs for tables with a simple PRIMARY KEY and no clustering columns would be 
equal to zero (or almost zero depending on the version). Consequently, the 
batch_size_fail_threshold_in_kb has no effect for such tables, and it cannot 
protect the cluster from being overloaded.{color}

{color:#1f2328}A test that reproduces the problem in 3.11 - 
[https://github.com/szymon-miezal/cassandra/commit/50b27c1e9030ce5ace6a6486a9876493c4ad41ae#diff-8cb249caec219439da461a4369f20530bb7d6cc0467c7e46f16288e22b574e61R43]
 {color}

{color:#1f2328}There are a few ways it could be solved:{color}
 * {color:#1f2328}Modifying the existing batch_size_fail_threshold_in_kb to 
take into account the primary keys size (it has the disadvantage of changing 
the semantic of the guardrail thus introducing a regression).{color}
 * {color:#1f2328}Adding a new guardrail e.g. 
batch_size_with_pk_fail_threshold_in_kb that is going to be calculated taking 
primary key into account.{color}
 * Adding a -D switch that by default would be {{false}}{color:#1f2328} meaning 
that in case the new formula (which takes PK into account) yields value over 
the error threshold it will gracefully tell us about it in an additional log 
message. Changing the flag value to {color}{{true}}{color:#1f2328} would be 
equivalent to the new formula and error will be thrown in case we get over the 
threshold.{color}

{{{color:#1f2328}I have a preference for going with an option that adds a new 
guardrail.{color}}}

 

  was:
{color:#1f2328}The size of a mutation does not consider the primary key size. 
In the context of BATCHed mutations, this means that INSERTs, DELETEs, and 
UPDATEs for tables with a simple PRIMARY KEY and no clustering columns would be 
equal to zero (or almost zero depending on the version). Consequently, the 
batch_size_fail_threshold_in_kb has no effect for such tables, and it cannot 
protect the cluster from being overloaded.{color}

{color:#1f2328}A test that reproduces the problem in 3.11 - 
[https://github.com/szymon-miezal/cassandra/commit/50b27c1e9030ce5ace6a6486a9876493c4ad41ae#diff-8cb249caec219439da461a4369f20530bb7d6cc0467c7e46f16288e22b574e61R43]
 
{color}

{color:#1f2328}There are a few ways it could be solved:{color}
 * {color:#1f2328}Modifying the existing batch_size_fail_threshold_in_kb to 
take into account the primary keys size (it has the disadvantage of changing 
the semantic of the guardrail thus introducing a regression).
{color}
 * {color:#1f2328}Adding a new guardrail e.g. 
{color:#1f2328}batch_size_with_pk_fail_threshold_in_kb that is going to be 
calculated taking primary key into account.{color}{color}
 * {{{color:#1f2328}{color:#1f2328}Adding a -D switch{color:#1f2328} that by 
default would be {color}{{false}}{color:#1f2328} meaning that in case the new 
formula (which takes PK into account) yields value over the error threshold it 
will gracefully tell us about it in an additional log message. Changing the 
flag value to {color}{{true}}{color:#1f2328} would be equivalent to the new 
formula and error will be thrown in case we get over the 
threshold.{color}{color}{color}}}

{{{color:#1f2328}{color:#1f2328}{color:#1f2328}I have a preference for going 
with an option that adds a new guardrail.{color}{color}{color}}}

 


> Batch size guardrail ignores primary key size
> ---------------------------------------------
>
>                 Key: CASSANDRA-19413
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19413
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/Guardrails
>            Reporter: Szymon Miezal
>            Assignee: Szymon Miezal
>            Priority: Normal
>
> {color:#1f2328}The size of a mutation does not consider the primary key size. 
> In the context of BATCHed mutations, this means that INSERTs, DELETEs, and 
> UPDATEs for tables with a simple PRIMARY KEY and no clustering columns would 
> be equal to zero (or almost zero depending on the version). Consequently, the 
> batch_size_fail_threshold_in_kb has no effect for such tables, and it cannot 
> protect the cluster from being overloaded.{color}
> {color:#1f2328}A test that reproduces the problem in 3.11 - 
> [https://github.com/szymon-miezal/cassandra/commit/50b27c1e9030ce5ace6a6486a9876493c4ad41ae#diff-8cb249caec219439da461a4369f20530bb7d6cc0467c7e46f16288e22b574e61R43]
>  {color}
> {color:#1f2328}There are a few ways it could be solved:{color}
>  * {color:#1f2328}Modifying the existing batch_size_fail_threshold_in_kb to 
> take into account the primary keys size (it has the disadvantage of changing 
> the semantic of the guardrail thus introducing a regression).{color}
>  * {color:#1f2328}Adding a new guardrail e.g. 
> batch_size_with_pk_fail_threshold_in_kb that is going to be calculated taking 
> primary key into account.{color}
>  * Adding a -D switch that by default would be {{false}}{color:#1f2328} 
> meaning that in case the new formula (which takes PK into account) yields 
> value over the error threshold it will gracefully tell us about it in an 
> additional log message. Changing the flag value to 
> {color}{{true}}{color:#1f2328} would be equivalent to the new formula and 
> error will be thrown in case we get over the threshold.{color}
> {{{color:#1f2328}I have a preference for going with an option that adds a new 
> guardrail.{color}}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to