[
https://issues.apache.org/jira/browse/CASSANDRA-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056421#comment-17056421
]
Jordan West commented on CASSANDRA-15614:
-----------------------------------------
Thanks for the diagnosis [~e.dimitrova]. Have you thought about converting the
test to in-jvm dtest? Perhaps, we could have more control and get rid of the
flakiness? It would also allow us to make better assertions about streaming
failure than just "is it in the logs". Then we could safely delete the
old/dtest versions.
I'd have to look at this test closer to think about ways to make it 100% solid
w/in dest. Unfortunately, from the dtest level I'm not sure we have the control
we need.
> Flakey test - TestBootstrap.test_resumable_bootstrap
> -----------------------------------------------------
>
> Key: CASSANDRA-15614
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15614
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest
> Reporter: Ekaterina Dimitrova
> Assignee: Ekaterina Dimitrova
> Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: test_resumable_bootstrap-CASSANDRA-15614.txt
>
>
> Fails 2 out of 10 times on Mac/Java 8
> {noformat}
> ==================================================================================================================
> FAILURES
> ==================================================================================================================
> *___________________________________________________________________________________________________
> TestBootstrap.test_resumable_bootstrap
> ___________________________________________________________________________________________________*
>
> self = <bootstrap_test.TestBootstrap object at 0x10fd488d0>
>
> *@since('2.2')*
> *def test_resumable_bootstrap(self):*
> *"""*
> *Test resuming bootstrap after data streaming failure*
> *"""*
> *cluster = self.cluster*
> *cluster.populate(2)*
> **
> *node1 = cluster.nodes['node1']*
> *# set up byteman*
> *node1.byteman_port = '8100'*
> *node1.import_config_files()*
> **
> *cluster.start(wait_other_notice=True)*
> *# kill stream to node3 in the middle of streaming to let it fail*
> *if cluster.version() < '4.0':*
> *node1.byteman_submit([self.byteman_submit_path_pre_4_0])*
> *else:*
> *node1.byteman_submit([self.byteman_submit_path_4_0])*
> *node1.stress(['write', 'n=1K', 'no-warmup', 'cl=TWO', '-schema',
> 'replication(factor=2)', '-rate', 'threads=50'])*
> *cluster.flush()*
> **
> *# start bootstrapping node3 and wait for streaming*
> *node3 = new_node(cluster)*
> *node3.start(wait_other_notice=False)*
> **
> *# let streaming fail as we expect*
> *> node3.watch_log_for('Some data streaming failed')*
>
> *bootstrap_test.py*:365:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _
>
> self = <ccmlib.node.Node object at 0x10fdf1810>, exprs = 'Some data streaming
> failed', from_mark = None, timeout = 600, process = None, verbose = False,
> filename = 'system.log'
>
> *def watch_log_for(self, exprs, from_mark=None, timeout=600,
> process=None, verbose=False, filename='system.log'):*
> *"""*
> *Watch the log until one or more (regular) expression are found.*
> *This methods when all the expressions have been found or the
> method*
> *timeouts (a TimeoutError is then raised). On successful
> completion,*
> *a list of pair (line matched, match object) is returned.*
> *"""*
> *start = time.time()*
> *tofind = [exprs] if isinstance(exprs, string_types) else exprs*
> *tofind = [re.compile(e) for e in tofind]*
> *matchings = []*
> *reads = ""*
> *if len(tofind) == 0:*
> *return None*
> **
> *log_file = os.path.join(self.get_path(), 'logs', filename)*
> *output_read = False*
> *while not os.path.exists(log_file):*
> *time.sleep(.5)*
> *if start + timeout < time.time():*
> *raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S",
> time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be
> created.".format(log_file))*
> *if process and not output_read:*
> *process.poll()*
> *if process.returncode is not None:*
> *self.print_process_output(self.name, process, verbose)*
> *output_read = True*
> *if process.returncode != 0:*
> *raise RuntimeError() # Shouldn't reuse RuntimeError
> but I'm lazy*
> **
> *with open(log_file) as f:*
> *if from_mark:*
> *f.seek(from_mark)*
> **
> *while True:*
> *# First, if we have a process to check, then check it.*
> *# Skip on Windows - stdout/stderr is cassandra.bat*
> *if not common.is_win() and not output_read:*
> *if process:*
> *process.poll()*
> *if process.returncode is not None:*
> *self.print_process_output(self.name, process,
> verbose)*
> *output_read = True*
> *if process.returncode != 0:*
> *raise RuntimeError() # Shouldn't reuse
> RuntimeError but I'm lazy*
> **
> *line = f.readline()*
> *if line:*
> *reads = reads + line*
> *for e in tofind:*
> *m = e.search(line)*
> *if m:*
> *matchings.append((line, m))*
> *tofind.remove(e)*
> *if len(tofind) == 0:*
> *return matchings[0] if isinstance(exprs,
> string_types) else matchings*
> *else:*
> *# yep, it's ugly*
> *time.sleep(1)*
> *if start + timeout < time.time():*
> *> raise TimeoutError(time.strftime("%d %b %Y
> %H:%M:%S", time.gmtime()) + " [" + self.name + "] Missing: " + str([e.pattern
> for e in tofind]) + ":\n" + reads[:50] + ".....\nSee {} for
> remainder".format(filename))*
> *E ccmlib.node.TimeoutError: 02 Mar 2020 17:16:34
> [node3] Missing: ['Some data streaming failed']:*
> *E INFO [main] 2020-03-02 12:06:34,963
> YamlConfigura.....*
> *E See system.log for remainder*
>
> *../../dtest-new2/src/ccm/ccmlib/node.py*:536: TimeoutError
> -----------------------------------------------------------------------------------------------------------
> Captured stdout setup
> ------------------------------------------------------------------------------------------------------------
> 11:55:37,343 ccm DEBUG Log-watching thread starting.
> -------------------------------------------------------------------------------------------------------------
> Captured log setup
> -------------------------------------------------------------------------------------------------------------
> 11:55:37,239 conftest INFO Starting execution of test_resumable_bootstrap at
> 2020-03-02 11:55:37.239467
> 11:55:37,240 dtest_setup INFO cluster ccm directory:
> /var/folders/ql/nvcz74bd67d3vhw7227mpjm40000gp/T/dtest-fzxwwt9c
> ------------------------------------------------------------------------------------------------------------
> Captured stdout call
> ------------------------------------------------------------------------------------------------------------
> install rule inject stream failure
>
> ----------------------------------------------------------------------------------------------------------
> Captured stdout teardown
> ----------------------------------------------------------------------------------------------------------
> 12:06:04,870 ccm DEBUG Log-watching thread exiting.
> -----------------------------------------------------------------------------------------------------------
> Captured stdout setup
> ------------------------------------------------------------------------------------------------------------
> 12:06:06,20 ccm DEBUG Log-watching thread starting.
> -------------------------------------------------------------------------------------------------------------
> Captured log setup
> -------------------------------------------------------------------------------------------------------------
> 12:06:05,887 conftest INFO Starting execution of test_resumable_bootstrap at
> 2020-03-02 12:06:05.887592
> 12:06:05,888 dtest_setup INFO cluster ccm directory:
> /var/folders/ql/nvcz74bd67d3vhw7227mpjm40000gp/T/dtest-yntcltkx
> ------------------------------------------------------------------------------------------------------------
> Captured stdout call
> ------------------------------------------------------------------------------------------------------------
> install rule inject stream failure
>
> ----------------------------------------------------------------------------------------------------------
> Captured stdout teardown
> ----------------------------------------------------------------------------------------------------------
> 12:06:04,870 ccm DEBUG Log-watching thread exiting.
> ----------------------------------------------------------------------------------------------------------
> Captured stdout teardown
> ----------------------------------------------------------------------------------------------------------
> 12:06:04,870 ccm DEBUG Log-watching thread exiting.
> ----------------------------------------------------------------------------------------------------------
> Captured stdout teardown
> ----------------------------------------------------------------------------------------------------------
> 12:16:34,818 ccm DEBUG Log-watching thread exiting.
> ===Flaky Test Report===
>
> test_resumable_bootstrap failed (1 runs remaining out of 2).
> <class 'ccmlib.node.TimeoutError'>
> 02 Mar 2020 17:06:04 [node3] Missing: ['Some data streaming failed']:
> INFO [main] 2020-03-02 11:56:05,081 YamlConfigura.....
> See system.log for remainder
> [<TracebackEntry
> /Users/ekaterina.dimitri/IdeaProjects/cassandra-dtest-d/bootstrap_test.py:365>,
> <TracebackEntry
> /Users/ekaterina.dimitri/dtest-new2/src/ccm/ccmlib/node.py:536>]
> test_resumable_bootstrap failed; it passed 0 out of the required 1 times.
> <class 'ccmlib.node.TimeoutError'>
> 02 Mar 2020 17:16:34 [node3] Missing: ['Some data streaming failed']:
> INFO [main] 2020-03-02 12:06:34,963 YamlConfigura.....
> See system.log for remainder
> [<TracebackEntry
> /Users/ekaterina.dimitri/IdeaProjects/cassandra-dtest-d/bootstrap_test.py:365>,
> <TracebackEntry
> /Users/ekaterina.dimitri/dtest-new2/src/ccm/ccmlib/node.py:536>]
>
> ===End Flaky Test Report===
> *========================================================================================================
> 1 failed in 1258.32 seconds
> =========================================================================================================*
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]