[
https://issues.apache.org/jira/browse/CASSANDRA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944090#comment-14944090
]
Paulo Motta commented on CASSANDRA-10233:
-----------------------------------------
[~eitikimura] I'd prefer not add the try-catch block on
{{HintedHandOffManager.scheduleAllDeliveries()}} as in the general case stored
hints shouldn't be corrupted, and it could make hints be silently dropped which
could lead to more serious issues. Since we already know the issue was caused
on {{StorageProxy.writeHintForMutation}} I think it suffices to perform the
check there. And if someone hit this bug before it's fixed, the workaround
should be truncate hints + repair.
I think what you did on {{StorageProxy.writeHintForMutation}} looks awesome,
but the thrown exception might be ignored silently by the hints executor, so
it's better to perform an explicit check, log a warn and throw an
AssertionError if {{hostId \!= null}}, so we'll be able to track if it happens
again in the logs. Could you please make these changes and re-submit the patch?
Please check if your patch apply to cassandra-2.2 branch, and if it doesn't
please also submit a patch for 2.2. It should not be necessary to create a
patch for 3.0, as the hints engine was rewritten from scratch.
Thanks for that [~eitikimura]!
[~fhsgoncalves] yep, afaik assertions should be optional in production, but
they should never happen in the first place. probably this is being caused by
some other issue I was not able to track in the latest changes, but
[~eitikimura]'s patch should help us troubleshoot if it happens in the future.
[~nutbunnies] [~mambocab] maybe it would be interesting to have a dtest job
with assertions disabled, since we rely a lot on assertions for pre-condition
checking, and many people disable them in production.
> IndexOutOfBoundsException in HintedHandOffManager
> -------------------------------------------------
>
> Key: CASSANDRA-10233
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10233
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Cassandra 2.2.0
> Reporter: Omri Iluz
> Assignee: Paulo Motta
> Attachments: cassandra-2.1.8-10233-v2.txt,
> cassandra-2.1.8-10233-v3.txt
>
>
> After upgrading our cluster to 2.2.0, the following error started showing
> exectly every 10 minutes on every server in the cluster:
> {noformat}
> INFO [CompactionExecutor:1381] 2015-08-31 18:31:55,506
> CompactionTask.java:142 - Compacting (8e7e1520-500e-11e5-b1e3-e95897ba4d20)
> [/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/la-540-big-Data.db:level=0,
> ]
> INFO [CompactionExecutor:1381] 2015-08-31 18:31:55,599
> CompactionTask.java:224 - Compacted (8e7e1520-500e-11e5-b1e3-e95897ba4d20) 1
> sstables to
> [/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/la-541-big,]
> to level=0. 1,544,495 bytes to 1,544,495 (~100% of original) in 93ms =
> 15.838121MB/s. 0 total partitions merged to 4. Partition merge counts were
> {1:4, }
> ERROR [HintedHandoff:1] 2015-08-31 18:31:55,600 CassandraDaemon.java:182 -
> Exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.IndexOutOfBoundsException: null
> at java.nio.Buffer.checkIndex(Buffer.java:538) ~[na:1.7.0_79]
> at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410)
> ~[na:1.7.0_79]
> at org.apache.cassandra.utils.UUIDGen.getUUID(UUIDGen.java:106)
> ~[apache-cassandra-2.2.0.jar:2.2.0]
> at
> org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:515)
> ~[apache-cassandra-2.2.0.jar:2.2.0]
> at
> org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:88)
> ~[apache-cassandra-2.2.0.jar:2.2.0]
> at
> org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:168)
> ~[apache-cassandra-2.2.0.jar:2.2.0]
> at
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
> ~[apache-cassandra-2.2.0.jar:2.2.0]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> [na:1.7.0_79]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> [na:1.7.0_79]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> [na:1.7.0_79]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [na:1.7.0_79]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_79]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_79]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)