[ 
https://issues.apache.org/jira/browse/CASSANDRA-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939944#comment-14939944
 ] 

Fernando Gonçalves edited comment on CASSANDRA-10233 at 10/1/15 3:11 PM:
-------------------------------------------------------------------------

Hi [~nutbunnies], I work together with Eiti Kimura at Movile, and this issue is 
happening in one of our cluster of cassandra.

I'll try answer your questions:

- how many nodes?
Currently we are running with 15 nodes, in 2 racks, in the same datacenter. One 
rack has 7 nodes and the other has 8 nodes.

- assuming rolling upgrade
I did't understand if this is a question, but what I can say is that we already 
upgraded to version 2.1.9 yesterday, and  the problem started when we added 7 
new nodes to the cluster a week ago. We add one node a time, waiting for each 
node join the cluster before start the joining of the next node.

- jdk change?
We are using the same version for a long time, Java Hotspot 1.8.0_45-b14.

- roughly how long was each node unavailable
I'm sending the uptime of each node. The nodes were not unavailable, only very 
slow to respond requests some times.
pompeia1   14:52:37 up 126 days
pompeia2   14:52:37 up 126 days
pompeia3   14:52:37 up 126 days
pompeia4   14:52:37 up 126 days
pompeia5   14:52:37 up 126 days
pompeia6   14:52:37 up 126 days
pompeia7   14:52:37 up 82 days
pompeia8   14:52:37 up 82 days
pompeia9   14:52:37 up 7 days
pompeia10  14:52:37 up 7 days
pompeia11  14:52:37 up 7 days
pompeia12  14:52:37 up 7 days
pompeia13  14:52:37 up 7 days
pompeia14  14:52:37 up 7 days
pompeia15  14:52:37 up 7 days

- gc_grace value of table with broken hint
values of max_hint_window_in_ms, max_hints_delivery_threads, 
hinted_handoff_enabled, hinted_handoff_throttle_in_kb in cassandra.yaml
We are not sure about the table that is problematic, but we think that is the 
most large (considering the records count and number of columns) and most used 
table that we have, and I'm going to inform the its values:
-- gc_grace_seconds = 864000
The value in the application.yml
--  max_hint_window_in_ms: 10800000
-- max_hints_delivery_threads: 2
-- hinted_handoff_enabled: true
-- hinted_handoff_throttle_in_kb: 1024

- what type of mutation was the hint without a target_id?
I don't know how to get the type of mutation, only the mutation value, that is 
a blob in the table. Can you help me here?

If you need any other information, I can send to you!
Thank you!


was (Author: fhsgoncalves):
Hi [~nutbunnies], I work together with Eiti Kimura at Movile, and this issue is 
happening in one of our cluster of cassandra.

I'll try answer your questions:

- how many nodes?
Currently we are running with 15 nodes, in 2 racks, in the same datacenter. One 
rack has 7 nodes and the other has 8 nodes.

- assuming rolling upgrade
I did't understand if this is a question, but what I can say is that we already 
upgraded to version 2.1.9 yesterday, and  the problem started when we added 7 
new nodes to the cluster a week ago. We add one node a time, waiting for each 
node join the cluster before start the joining of the next node.

- jdk change?
We are using the same version for a long time, Java Hotspot 1.8.0_45-b14.

- roughly how long was each node unavailable
pompeia1   14:52:37 up 126 days
pompeia2   14:52:37 up 126 days
pompeia3   14:52:37 up 126 days
pompeia4   14:52:37 up 126 days
pompeia5   14:52:37 up 126 days
pompeia6   14:52:37 up 126 days
pompeia7   14:52:37 up 82 days
pompeia8   14:52:37 up 82 days
pompeia9   14:52:37 up 7 days
pompeia10  14:52:37 up 7 days
pompeia11  14:52:37 up 7 days
pompeia12  14:52:37 up 7 days
pompeia13  14:52:37 up 7 days
pompeia14  14:52:37 up 7 days
pompeia15  14:52:37 up 7 days

- gc_grace value of table with broken hint
values of max_hint_window_in_ms, max_hints_delivery_threads, 
hinted_handoff_enabled, hinted_handoff_throttle_in_kb in cassandra.yaml
We are not sure about the table that is problematic, but we think that is the 
most large (considering the records count and number of columns) and most used 
table that we have, and I'm going to inform the its values:
-- gc_grace_seconds = 864000
The value in the application.yml
--  max_hint_window_in_ms: 10800000
-- max_hints_delivery_threads: 2
-- hinted_handoff_enabled: true
-- hinted_handoff_throttle_in_kb: 1024

- what type of mutation was the hint without a target_id?
I don't know how to get the type of mutation, only the mutation value, that is 
a blob in the table. Can you help me here?

If you need any other information, I can send to you!
Thank you!

> IndexOutOfBoundsException in HintedHandOffManager
> -------------------------------------------------
>
>                 Key: CASSANDRA-10233
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10233
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.2.0
>            Reporter: Omri Iluz
>            Assignee: Andrew Hust
>         Attachments: cassandra-2.1.8-10233-v2.txt, cassandra-2.1.8-10233.txt
>
>
> After upgrading our cluster to 2.2.0, the following error started showing 
> exectly every 10 minutes on every server in the cluster:
> {noformat}
> INFO  [CompactionExecutor:1381] 2015-08-31 18:31:55,506 
> CompactionTask.java:142 - Compacting (8e7e1520-500e-11e5-b1e3-e95897ba4d20) 
> [/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/la-540-big-Data.db:level=0,
>  ]
> INFO  [CompactionExecutor:1381] 2015-08-31 18:31:55,599 
> CompactionTask.java:224 - Compacted (8e7e1520-500e-11e5-b1e3-e95897ba4d20) 1 
> sstables to 
> [/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/la-541-big,] 
> to level=0.  1,544,495 bytes to 1,544,495 (~100% of original) in 93ms = 
> 15.838121MB/s.  0 total partitions merged to 4.  Partition merge counts were 
> {1:4, }
> ERROR [HintedHandoff:1] 2015-08-31 18:31:55,600 CassandraDaemon.java:182 - 
> Exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.IndexOutOfBoundsException: null
>       at java.nio.Buffer.checkIndex(Buffer.java:538) ~[na:1.7.0_79]
>       at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:410) 
> ~[na:1.7.0_79]
>       at org.apache.cassandra.utils.UUIDGen.getUUID(UUIDGen.java:106) 
> ~[apache-cassandra-2.2.0.jar:2.2.0]
>       at 
> org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:515)
>  ~[apache-cassandra-2.2.0.jar:2.2.0]
>       at 
> org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:88)
>  ~[apache-cassandra-2.2.0.jar:2.2.0]
>       at 
> org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:168)
>  ~[apache-cassandra-2.2.0.jar:2.2.0]
>       at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.2.0.jar:2.2.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> [na:1.7.0_79]
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) 
> [na:1.7.0_79]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>  [na:1.7.0_79]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  [na:1.7.0_79]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_79]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79]
>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to