[jira] [Commented] (CASSANDRA-15152) Batch Log - Mutation too large while bootstrapping a newly added node

JeongHun Kim (Jira) Thu, 12 Mar 2020 21:17:29 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058410#comment-17058410
 ]


JeongHun Kim commented on CASSANDRA-15152:
------------------------------------------

I also have the same problem while removing a node from a cluster. In my node, 
commitlog_segment_size_in_mb has 64. And inserted mutation size is less than 
32MB certainly. I don't know why the ERROR log happen again and again. Has 
anyone solved this problem?

> Batch Log - Mutation too large while bootstrapping a newly added node
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-15152
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15152
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Batch Log
>            Reporter: Avraham Kalvo
>            Priority: Normal
>
> Scaling our six nodes cluster by three more nodes, we came upon behavior in 
> which bootstrap appears hung under `UJ` (two previously added were joined 
> within approximately 2.5 hours).
> Examining the logs the following became apparent shortly after the bootstrap 
> process has commenced for this node:
> ```
> ERROR [BatchlogTasks:1] 2019-06-05 14:43:46,508 CassandraDaemon.java:207 - 
> Exception in thread Thread[BatchlogTasks:1,5,main]
> java.lang.IllegalArgumentException: Mutation of 108035175 bytes is too large 
> for the maximum size of 16777216
>         at 
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:520) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:399) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at org.apache.cassandra.db.Mutation.apply(Mutation.java:213) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendSingleReplayMutation(BatchlogManager.java:427)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendReplays(BatchlogManager.java:402)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.replay(BatchlogManager.java:318)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> org.apache.cassandra.batchlog.BatchlogManager.processBatchlogEntries(BatchlogManager.java:238)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> org.apache.cassandra.batchlog.BatchlogManager.replayFailedBatches(BatchlogManager.java:207)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-3.0.10.jar:3.0.10]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_201]
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_201]
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_201]
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_201]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_201]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_201]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
> ```
> And since then, repeating itself in the logs.
> We decided to discard the newly added apparently still joining node by doing 
> the following:
> 1. at first - simply restarting it, which resulted in it starting up 
> apparently normally 
> 2. then - decommission it by issuing `nodetool decommission`, this took long 
> (over 2.5 hours) and eventually was terminated by issuing `nodetool 
> removenode`
> 3. node removal was hung on a specific token, which led us to complete it by 
> force.
> 4. forcing the node removal has generated a corruption with one of the 
> `system.batches` table SSTables, which was removed (backed up) from its 
> underlying data dir as mitigation (78MB worth)
> 5. cluster-wide repair was run
> 6. `Mutation too large` error is now repeating itself in three different 
> permutations (alerted sizes) under three different nodes (our standard 
> replication factor is of three)
> We're not sure whether we're hitting 
> https://issues.apache.org/jira/browse/CASSANDRA-11670 or not, as it's said to 
> be resolved in our current version of 3.0.10.
> Still would like to verify what's the root cause for this? as we need to make 
> clear whether we are to expect this happening in production environments.
> How would you recommend verifying to which keyspace.table does this mutation 
> belong to?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15152) Batch Log - Mutation too large while bootstrapping a newly added node

Reply via email to