[ 
https://issues.apache.org/jira/browse/CASSANDRA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-4223:
------------------------------------

    Attachment: 4223_counter_session_id-V2.diff

4223_counter_session_id-V2.diff 

Uses stream source flag as discussed. Added the flags to StreamHeader so they 
were together. 

                
> Non Unique Streaming session ID's
> ---------------------------------
>
>                 Key: CASSANDRA-4223
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4223
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu 10.04.2 LTS
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> "Bare metal" servers from 
> https://www.stormondemand.com/servers/baremetal.html 
> The servers run on a custom hypervisor.
>  
>            Reporter: Aaron Morton
>            Assignee: Aaron Morton
>              Labels: datastax_qa
>             Fix For: 1.0.11, 1.1.1
>
>         Attachments: 4223_counter_session_id-V2.diff, 
> 4223_counter_session_id.diff, NanoTest.java, fmm streaming bug.txt
>
>
> I have observed repair processes failing due to duplicate Streaming session 
> ID's. In this installation it is preventing rebalance from completing. I 
> believe it has also prevented repair from completing in the past. 
> The attached streaming-logs.txt file contains log messages and an explanation 
> of what was happening during a repair operation. it has the evidence for 
> duplicate session ID's.
> The duplicate session id's were generated on the repairing node and sent to 
> the streaming node. The streaming source replaced the first session with the 
> second which resulted in both sessions failing when the first FILE_COMPLETE 
> message was received. 
> The errors were:
> {code:java}
> DEBUG [MiscStage:1] 2012-05-03 21:40:33,997 StreamReplyVerbHandler.java (line 
> 47) Received StreamReply StreamReply(sessionId=26132848816442266, 
> file='/var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db', 
> action=FILE_FINISHED)
> ERROR [MiscStage:1] 2012-05-03 21:40:34,027 AbstractCassandraDaemon.java 
> (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is 
> /var/lib/cassandra/data/FMM_Studio/PartsData-hc-1-Data.db but is null
>         at 
> org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at 
> org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> and
> {code:java}
> DEBUG [MiscStage:2] 2012-05-03 21:40:36,497 StreamReplyVerbHandler.java (line 
> 47) Received StreamReply StreamReply(sessionId=26132848816442266, 
> file='/var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db', 
> action=FILE_FINISHED)
> ERROR [MiscStage:2] 2012-05-03 21:40:36,497 AbstractCassandraDaemon.java 
> (line 139) Fatal exception in thread Thread[MiscStage:2,5,main]
> java.lang.IllegalStateException: target reports current file is 
> /var/lib/cassandra/data/OpsCenter/rollups7200-hc-3-Data.db but is null
>         at 
> org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at 
> org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Unknown Source)
> {code}
> I think this is because System.nanoTime() is used for the session ID when 
> creating the StreamInSession objects (driven from 
> StorageService.requestRanges()) . 
> From the documentation 
> (http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime()) 
> {quote}
> This method provides nanosecond precision, but not necessarily nanosecond 
> accuracy. No guarantees are made about how frequently values change. 
> {quote}
> Also some info here on clocks and timers 
> https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
> The hypervisor may be at fault here. But it seems like we cannot rely on 
> successive calls to nanoTime() to return different values. 
> To avoid message/interface changes on the StreamHeader it would be good to 
> keep the session ID a long. The simplest approach may be to make successive 
> calls to nanoTime until the result changes. We could fail if a certain number 
> of milliseconds have passed. 
> Hashing the file names and ranges is also a possibility, but more involved. 
> (We may also want to drop latency times that are 0 nano seconds.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to