[ 
https://issues.apache.org/jira/browse/CASSANDRA-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250495#comment-15250495
 ] 

Philip Thompson edited comment on CASSANDRA-11608 at 4/20/16 6:50 PM:
----------------------------------------------------------------------

Looking at the logs, it seems node4 was added before node3 was fully recognized 
as down. I tried setting {{wait-other-notice=True}} on the {{node3.stop()}} 
call, but that didn't fix the issue at all. I haven't checked the logs for 
those runs yet.

http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/74/testReport/
http://cassci.datastax.com/view/Parameterized/job/parameterized_dtest_multiplexer/75/testReport/


was (Author: philipthompson):
Looking at the logs, it seems node4 was added before node3 was fully recognized 
as down. I tried setting {{wait-other-notice=True}} on the {{node3.stop()}} 
call, but that didn't fix the issue at all. I haven't checked the logs for 
those runs yet.

> dtest failure in 
> replace_address_test.TestReplaceAddress.replace_first_boot_test
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11608
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11608
>             Project: Cassandra
>          Issue Type: Test
>            Reporter: Jim Witschey
>            Assignee: Philip Thompson
>              Labels: dtest
>
> This looks like a timeout kind of flap. It's flapped once. Example failure:
> http://cassci.datastax.com/job/cassandra-2.2_offheap_dtest/344/testReport/replace_address_test/TestReplaceAddress/replace_first_boot_test
> Failed on CassCI build cassandra-2.2_offheap_dtest #344 - 2.2.6-tentative
> {code}
> Error Message
> 15 Apr 2016 16:23:41 [node3] Missing: ['127.0.0.4.* now UP']:
> INFO  [main] 2016-04-15 16:21:32,345 Config.java:4.....
> See system.log for remainder
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: /mnt/tmp/dtest-4i5qkE
> dtest: DEBUG: Custom init_config not found. Setting defaults.
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
>     'memtable_allocation_type': 'offheap_objects',
>     'num_tokens': '32',
>     'phi_convict_threshold': 5,
>     'start_rpc': 'true'}
> dtest: DEBUG: Starting cluster with 3 nodes.
> dtest: DEBUG: 32
> dtest: DEBUG: Inserting Data...
> dtest: DEBUG: Stopping node 3.
> dtest: DEBUG: Testing node stoppage (query should fail).
> dtest: DEBUG: Retrying read after timeout. Attempt #0
> dtest: DEBUG: Retrying read after timeout. Attempt #1
> dtest: DEBUG: Retrying request after UE. Attempt #2
> dtest: DEBUG: Retrying request after UE. Attempt #3
> dtest: DEBUG: Retrying request after UE. Attempt #4
> dtest: DEBUG: Starting node 4 to replace node 3
> dtest: DEBUG: Verifying querying works again.
> dtest: DEBUG: Verifying tokens migrated sucessfully
> dtest: DEBUG: ('WARN  [main] 2016-04-15 16:21:21,068 TokenMetadata.java:196 - 
> Token -3855903180169109916 changing ownership from /127.0.0.3 to 
> /127.0.0.4\n', <_sre.SRE_Match object at 0x7fd21c0e2370>)
> dtest: DEBUG: Try to restart node 3 (should fail)
> dtest: DEBUG: [('WARN  [GossipStage:1] 2016-04-15 16:21:22,942 
> StorageService.java:1962 - Host ID collision for 
> 75916cc0-86ec-4136-b336-862a49953616 between /127.0.0.3 and /127.0.0.4; 
> /127.0.0.4 is the new owner\n', <_sre.SRE_Match object at 0x7fd1f83555e0>)]
> --------------------- >> end captured logging << ---------------------
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
>     testMethod()
>   File "/home/automaton/cassandra-dtest/replace_address_test.py", line 212, 
> in replace_first_boot_test
>     node4.start(wait_for_binary_proto=True)
>   File "/home/automaton/ccm/ccmlib/node.py", line 610, in start
>     node.watch_log_for_alive(self, from_mark=mark)
>   File "/home/automaton/ccm/ccmlib/node.py", line 457, in watch_log_for_alive
>     self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
>   File "/home/automaton/ccm/ccmlib/node.py", line 425, in watch_log_for
>     raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".....\nSee {} for remainder".format(filename))
> "15 Apr 2016 16:23:41 [node3] Missing: ['127.0.0.4.* now UP']:\nINFO  [main] 
> 2016-04-15 16:21:32,345 Config.java:4.....\nSee system.log for 
> remainder\n-------------------- >> begin captured logging << 
> --------------------\ndtest: DEBUG: cluster ccm directory: 
> /mnt/tmp/dtest-4i5qkE\ndtest: DEBUG: Custom init_config not found. Setting 
> defaults.\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n    'memtable_allocation_type': 'offheap_objects',\n  
>   'num_tokens': '32',\n    'phi_convict_threshold': 5,\n    'start_rpc': 
> 'true'}\ndtest: DEBUG: Starting cluster with 3 nodes.\ndtest: DEBUG: 
> 32\ndtest: DEBUG: Inserting Data...\ndtest: DEBUG: Stopping node 3.\ndtest: 
> DEBUG: Testing node stoppage (query should fail).\ndtest: DEBUG: Retrying 
> read after timeout. Attempt #0\ndtest: DEBUG: Retrying read after timeout. 
> Attempt #1\ndtest: DEBUG: Retrying request after UE. Attempt #2\ndtest: 
> DEBUG: Retrying request after UE. Attempt #3\ndtest: DEBUG: Retrying request 
> after UE. Attempt #4\ndtest: DEBUG: Starting node 4 to replace node 3\ndtest: 
> DEBUG: Verifying querying works again.\ndtest: DEBUG: Verifying tokens 
> migrated sucessfully\ndtest: DEBUG: ('WARN  [main] 2016-04-15 16:21:21,068 
> TokenMetadata.java:196 - Token -3855903180169109916 changing ownership from 
> /127.0.0.3 to /127.0.0.4\\n', <_sre.SRE_Match object at 
> 0x7fd21c0e2370>)\ndtest: DEBUG: Try to restart node 3 (should fail)\ndtest: 
> DEBUG: [('WARN  [GossipStage:1] 2016-04-15 16:21:22,942 
> StorageService.java:1962 - Host ID collision for 
> 75916cc0-86ec-4136-b336-862a49953616 between /127.0.0.3 and /127.0.0.4; 
> /127.0.0.4 is the new owner\\n', <_sre.SRE_Match object at 
> 0x7fd1f83555e0>)]\n--------------------- >> end captured logging << 
> ---------------------"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to