[
https://issues.apache.org/jira/browse/CASSANDRA-15152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058410#comment-17058410
]
JeongHun Kim commented on CASSANDRA-15152:
------------------------------------------
I also have the same problem while removing a node from a cluster. In my node,
commitlog_segment_size_in_mb has 64. And inserted mutation size is less than
32MB certainly. I don't know why the ERROR log happen again and again. Has
anyone solved this problem?
> Batch Log - Mutation too large while bootstrapping a newly added node
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-15152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15152
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Batch Log
> Reporter: Avraham Kalvo
> Priority: Normal
>
> Scaling our six nodes cluster by three more nodes, we came upon behavior in
> which bootstrap appears hung under `UJ` (two previously added were joined
> within approximately 2.5 hours).
> Examining the logs the following became apparent shortly after the bootstrap
> process has commenced for this node:
> ```
> ERROR [BatchlogTasks:1] 2019-06-05 14:43:46,508 CassandraDaemon.java:207 -
> Exception in thread Thread[BatchlogTasks:1,5,main]
> java.lang.IllegalArgumentException: Mutation of 108035175 bytes is too large
> for the maximum size of 16777216
> at
> org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:256)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:520)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.db.Keyspace.applyNotDeferrable(Keyspace.java:399)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:213)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendSingleReplayMutation(BatchlogManager.java:427)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.sendReplays(BatchlogManager.java:402)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.batchlog.BatchlogManager$ReplayingBatch.replay(BatchlogManager.java:318)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.batchlog.BatchlogManager.processBatchlogEntries(BatchlogManager.java:238)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.batchlog.BatchlogManager.replayFailedBatches(BatchlogManager.java:207)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_201]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_201]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_201]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_201]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_201]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_201]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
> ```
> And since then, repeating itself in the logs.
> We decided to discard the newly added apparently still joining node by doing
> the following:
> 1. at first - simply restarting it, which resulted in it starting up
> apparently normally
> 2. then - decommission it by issuing `nodetool decommission`, this took long
> (over 2.5 hours) and eventually was terminated by issuing `nodetool
> removenode`
> 3. node removal was hung on a specific token, which led us to complete it by
> force.
> 4. forcing the node removal has generated a corruption with one of the
> `system.batches` table SSTables, which was removed (backed up) from its
> underlying data dir as mitigation (78MB worth)
> 5. cluster-wide repair was run
> 6. `Mutation too large` error is now repeating itself in three different
> permutations (alerted sizes) under three different nodes (our standard
> replication factor is of three)
> We're not sure whether we're hitting
> https://issues.apache.org/jira/browse/CASSANDRA-11670 or not, as it's said to
> be resolved in our current version of 3.0.10.
> Still would like to verify what's the root cause for this? as we need to make
> clear whether we are to expect this happening in production environments.
> How would you recommend verifying to which keyspace.table does this mutation
> belong to?
> Thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]