[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-12 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002757#comment-15002757
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


+1 LGTM

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000710#comment-15000710
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


If we submit a hint to an endpoint that left, when will the hint be cleaned up 
and discarded? Is there a race there?

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000754#comment-15000754
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


[There is an extra ')' in this trace 
statement|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-ultimate#diff-71f06c193f5b5e270cf8ac695164f43aR2492]

Test look good near as I can tell.

Why is isMemberJoining better than just using getHostID being null as an 
indicator of whether the hint should be written?

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-11 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001310#comment-15001310
 ] 

Paulo Motta commented on CASSANDRA-10485:
-

bq. If we submit a hint to an endpoint that left, when will the hint be cleaned 
up and discarded? Is there a race there?

The race window is very small, the node needs to be removed exactly between 
getting the ID from TokenMetadata and the hint being actually written by the 
HintsManager. If that happens, on 2.1 and 2.2 the hint will expire by ttl after 
gc_grace_seconds. On 3.0+, a hint file might be created for the removed node, 
and needs to be removed manually or via nodetool truncatehints.

Fixed minor nit and removed {{isMemberJoining}} assertion. Tests were 
resubmitted and seem OK.

||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-ultimate]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-dtest/lastCompletedBuild/testReport/]|

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998915#comment-14998915
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


How is the isMemberJoining() check atomic with getHostId() 
[here|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-final#diff-71f06c193f5b5e270cf8ac695164f43aR1015]?
 It's also doing endpointForHostId a little bit later.

I get why this works in 3.0+. 3.0+ only checks once to see if it is going to 
write the hint and doesn't do the dance resolving stuff from TMD multiple times 
thus risking seeing an inconsistent version.

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-10 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999531#comment-14999531
 ] 

Paulo Motta commented on CASSANDRA-10485:
-

Thanks for the review. Addressed issues in new (simpler) implementation:
* Discard hints for endpoints with null host ids (logging debug message), 
avoiding races between multiple {{TokenMetadata}} accesses. 
* Moved improved version of null host id assertion to 
{{TokenMetadata.getHostId}}: {{assert hostId != null || 
!isMemberOrJoining(endpoint)}}.
* On 2.2-, changed signature of {{HintedHandOffManager.hintFor}} to accept a 
{{Pair}}, avoiding an extra possibly racy access to 
{{TokenMetadata}}
* Applied similar logic to new hints implementation on 3.0+

||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-ultimate]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-ultimate]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-ultimate-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-ultimate-dtest/lastCompletedBuild/testReport/]|

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-09 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997633#comment-14997633
 ] 

Paulo Motta commented on CASSANDRA-10485:
-

I just found a bug on the previous version, where a node can be removed from 
TMD just before setting the new pending ranges, so the problem will persist. 

After thinking this through with a fresh mind, the solution is rather simple, I 
think I was over-complicating. Pending ranges are basically composed of 
"normal" endpoints, moving endpoints and bootstrapping endpoints. Moving 
endpoints are also "normal" endpoints. So, what we actually want to check 
before submitting a hint, is if the node is a normal endpoint or bootstrapping 
endpoint. If the node is neither a normal/moving or bootstrapping endpoint, we 
don't want to submit hints to it, simple as that. So, I added a new method 
{{TokenMetadata.isMemberOrJoining}} to check that before submitting a hint, 
thus avoiding getting a null host id on hint submission.

The two reports of this bug on CASSANDRA-6335 and CASSANDRA-10233, are when a 
node is replaced or when bootstrapping fails. When a node is replaced, it was a 
"normal" endpoint, but then it was replaced and it was removed from the ring, 
so we shouldn't submit a hint to it. When a new node is down after a failed 
bootstrap, it is removed from the ring, so we shouldn't submit a hint to it. 
Actually, with CASSANDRA-8838, there's a possibility of resuming a failed 
bootstrap, so we should not remove the bootstrapping node from the ring for a 
quarantine period, but we should handle this in a separate ticket.

Submitted a new branch with the proposed solution. Sorry for the confusion on 
this.
||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-final]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-final]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-final]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-final]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-final-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-final-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-final-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-final-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-final-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-final-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-final-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-final-dtest/lastCompletedBuild/testReport/]|

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   

[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-06 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994779#comment-14994779
 ] 

Paulo Motta commented on CASSANDRA-10485:
-

I implemented an alternative approach which is a bit cleaner and more 
deterministic. The basic idea is to have a new method 
{{TokenMetadata.isMemberOrPending()}}, and only submit hints to endpoints that 
are ring members or pending membership, thus, avoiding fetching null host IDs 
for removed pending endpoints while the new pending ranges are being calculated.

In order to support the {{TokenMetadata.isMemberOrPending()}} method, the 
{{TokenMetadata}} maintains a new {{livePendingEndpoints}} set which is 
populated every time new pending ranges are set. When endpoints are removed 
from {{TokenMetadata}} via the {{removeEndpoint}} method, they're also removed 
from the {{livePendingEndpoints}} set, so {{TokenMetadata.isMemberOrPending()}} 
returns false if the endpoint is evicted from the ring. Since both 
{{removeEndpoint}} and {{setPendingRanges}} update this set, they share a write 
lock. {{TokenMetadata.isMemberOrPending()}} also uses a read lock, similar to 
other methods {{isMember()}} or {{getHostId()}}.

Merging the solution from 2.1 to 2.2/3.0 was a bit tricky because the pending 
ranges calculation was extracted from the {{PendingRangeCalculatorService}} to 
{{TokenMetadata}} within a read lock, so I had to separate the actual 
calculation (within a read lock) to the actual  assignment of the 
{{pendingRanges}} via the {{setPendingRanges}} method, which uses a write lock. 
On 3.0, the hints submission part is slightly different (even simpler) due to 
the new hints implementation.

It's still not ideal but I guess better than the previous approach. I will add 
a link from this ticket to CASSANDRA-6061 so we can take this ticket into 
account when refactoring the {{TokenMetadata}}.

Below are the new branches and test results:
||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485-v3]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485-v3]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485-v3]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-v3-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-v3-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-v3-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-v3-dtest/lastCompletedBuild/testReport/]|


> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> 

[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-04 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990471#comment-14990471
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


Can you update the comment HintedHandoffManager.hintFor so that it says that 
getEndpointForHostId can return null in regular operation and not just test? 
Wouldn't want that check to get removed.

Another thing that is weird is that we convert from InetAddr to host id and 
then HintedHandoffManager converts that back into an InetAddr (although maybe 
not the same one?).

I am not sure how big these things are or how often we rebuild them, but 
constructing a copy of every map as we iterate even when we aren't going to 
modify it is more work. If it's not too much churn it's fine I just want to 
make sure it doesn't end up biting us.

I looked at the 3.0 code and it looks good.

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-03 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988473#comment-14988473
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


That fixes this bug, but the design of TMD is still error prone IMO. It 
presents a materialized view of a state of the world and it would be easiest to 
reason about if it moved from one consistent state to another from the 
perspective of readers.

I am lacking in knowledge of the ways in which TMD is used so I can't say if 
everyone uses it safely. We know hints didn't use it safely which makes me 
wonder how many others also aren't going to handle it being in an inconsistent 
state. Even if we get it internally consistent if things make decisions based 
on a stale version those need to be harmless or they need to eventually check 
and make sure they didn't do something that was invalidated.

I'll dig deeper into it tomorrow. If there is someone else who can chime in as 
to whether this is a valid concern that would be great.

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-03 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988450#comment-14988450
 ] 

Paulo Motta commented on CASSANDRA-10485:
-

bq. HashMultiMap uses good old HashMap internally. So it's not safe to update 
it and read at the same time. I think they meant that if the map is immutable 
it is safe to have multiple concurrent readers.

Thanks for clarifying this. Updated patch to perform removal of pending 
endpoint by working on a copy and then replacing the reference atomically.

bq. I think you either need to make the entire thing atomic and make TMD COW 
and propagate the TMD the entire way. Or alternatively optimistic and if the 
data isn't there just abort dropping the hint instead of asserting that it is 
an error. You could assert instead that pendingEndpointsFor() doesn't include 
the missing endpoint.

Implemented the optimistic approach of discarding hints to endpoints with null 
ID and log a debug message (this should happen seldom with immediate removal of 
the endpoint from pending ranges). Also moved assertion to 
{{TokenMetadata.getHostId()}}, so we guarantee that a host id can only be null 
if the endpoint is not part of the ring (or is not bootstrapping).

I needed to do an slight adaptation to the patch on 3.0 on 
{{StorageProxy.submitHint}} due to the new hints implementation. Tests with new 
implementation available below:

||2.1||2.2||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10485]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10485]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10485]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-testall/lastCompletedBuild/testReport/]|
|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10485-dtest/lastCompletedBuild/testReport/]|[dtests|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10485-dtest/lastCompletedBuild/testReport/]|

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-03 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988539#comment-14988539
 ] 

Paulo Motta commented on CASSANDRA-10485:
-

Yeah, definitely {{TokenMetadata}} needs some love. Perhaps this is a good time 
to provide some input/requirements for a {{TokenMetadata}} rewrite on 
CASSANDRA-6061.

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-11-02 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985401#comment-14985401
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


Having reviewed this further the entire approach of presenting a mutable TMD to 
readers is a little iffy.

pendingRangesFor() is not isolated from removeEndpoint() so remove endpoint is 
tearing apart the datastructure while readers of TMD are viewing an 
inconsistent state.

So you could call pendingEndpointsFor() and think you are supposed to submit a 
hint, but when the time comes to submit the hint the endpoint is already 
removed from the TMD.

I think you either need to make the entire thing atomic and make TMD COW and 
propagate the TMD the entire way. Or alternatively optimistic and if the data 
isn't there just abort dropping the hint instead of asserting that it is an 
error. You could assert instead that pendingEndpointsFor() doesn't include the 
missing endpoint.

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-10-30 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983701#comment-14983701
 ] 

Ariel Weisberg commented on CASSANDRA-10485:


HashMultiMap uses good old HashMap internally. So it's not safe to update it 
and read at the same time. I think they meant that if the map is immutable it 
is safe to have multiple concurrent readers.

I'm still getting situated on how users of TMD work with this state. Will have 
more later.




> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10485) Missing host ID on hinted handoff write

2015-10-26 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975122#comment-14975122
 ] 

Paulo Motta commented on CASSANDRA-10485:
-

It seems pending endpoints are removed from the {{TokenMetadata}} before the 
new pending ranges are calculated by {{StorageService}}:
{code:title=StorageService.java|borderStyle=solid}
public void onRemove(InetAddress endpoint)
{
tokenMetadata.removeEndpoint(endpoint);
PendingRangeCalculatorService.instance.update();
}
{code}

So, there's a window where nodes can be 

> Missing host ID on hinted handoff write
> ---
>
> Key: CASSANDRA-10485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10485
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>
> when I restart one of them I receive the error "Missing host ID":
> {noformat}
> WARN  [SharedPool-Worker-1] 2015-10-08 13:15:33,882 
> AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.AssertionError: Missing host ID for 63.251.156.141
> at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:978)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:950)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:2235)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
> at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  ~[apache-cassandra-2.1.3.jar:2.1.3]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.3.jar:2.1.3]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {noformat}
> If I made nodetool status, the problematic node has ID:
> {noformat}
> UN  10.10.10.12  1.3 TB 1   ?   
> 4d5c8fd2-a909-4f09-a23c-4cd6040f338a  rack3
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)