[
https://issues.apache.org/jira/browse/CASSANDRA-17085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437032#comment-17437032
]
David Capwell commented on CASSANDRA-17085:
-------------------------------------------
bootstrap_test.py::TestBootstrap::test_bootstrap_binary_disabled also fails
with the following
{code}
try:
node3.nodetool('join')
pytest.fail('nodetool should have errored and failed to join ring')
except ToolError as t:
> assert "Cannot join the ring until bootstrap completes" in t.stdout
E assert 'Cannot join the ring until bootstrap completes' in ''
E + where '' = ToolError("Subprocess ['nodetool', '-h',
'localhost', '-p', '7300', 'join'] exited with non-zero status; exit status: 1;
\nstderr: nodetool: Failed to connect to 'localhost:7300' - SocketException:
'Connection reset'.\n",).stdout
{code}
In both cases it looks like issue connecting to JMX. Logs for node3 don't show
the process going down during the test so not sure what's up yet. This is
localhost networking so shouldn't have random connection close events, so not
sure what leads to this flaky behavior yet.
> Fix python dtests bootstrap_test.py::TestBootstrap
> --------------------------------------------------
>
> Key: CASSANDRA-17085
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17085
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/python
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Fix For: 4.x
>
>
> Right now bootstrap tests are failing every time we run, this work is to
> debug and fix the underling issue.
> Examples:
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7089
> {code}
> > node3.nodetool('bootstrap resume')
> bootstrap_test.py:1014:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:1005: in nodetool
> return handle_external_tool_process(p, ['nodetool', '-h', 'localhost',
> '-p', str(self.jmx_port)] + shlex.split(cmd))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> process = <subprocess.Popen object at 0x7fb071a03940>
> cmd_args = ['nodetool', '-h', 'localhost', '-p', '7300', 'bootstrap', ...]
> def handle_external_tool_process(process, cmd_args):
> out, err = process.communicate()
> if (out is not None) and isinstance(out, bytes):
> out = out.decode()
> if (err is not None) and isinstance(err, bytes):
> err = err.decode()
> rc = process.returncode
>
> if rc != 0:
> > raise ToolError(cmd_args, rc, out, err)
> E ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost',
> '-p', '7300', 'bootstrap', 'resume'] exited with non-zero status; exit
> status: 1;
> E stderr: nodetool: Failed to connect to 'localhost:7300' -
> EOFException: 'null'.
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2305: ToolError
> {code}
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7087
> {code}
> > node1.start()
> bootstrap_test.py:483:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:895: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:664: in
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout,
> filename=filename)
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:592: in watch_log_for
> head=reads[:50], tail="..."+reads[len(reads)-150:]))
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> start = 1635453190.3118386, timeout = 120
> msg = "Missing: ['127.0.0.1:7000.* is now UP'] not found in system.log:\n
> Head: \n Tail: ..."
> node = 'node3'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> > raise TimeoutError.create(start, timeout, msg, node)
> E ccmlib.node.TimeoutError: 28 Oct 2021 20:35:10 [node3] after
> 120.12/120 seconds Missing: ['127.0.0.1:7000.* is now UP'] not found in
> system.log:
> E Head:
> E Tail: ...
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]