[ 
https://issues.apache.org/jira/browse/IMPALA-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773558#comment-16773558
 ] 

ASF subversion and git services commented on IMPALA-8191:
---------------------------------------------------------

Commit c1274fafb04de1b9b7c3a17e209814b8c4346311 in impala's branch 
refs/heads/master from Lars Volker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c1274fa ]

IMPALA-8191: Wait for additional breakpad processes during test

The Breakpad signal handler forks off a process to write a minidump.
During the breakpad tests we send signals to the Impala daemons and then
wait for all processes to go away. Prior to this change we did this by
waiting on the PID returned by process.get_pid(). It is determined by
iterating over psutil.get_pid_list() which is an ordered list of PIDs
running on the system. We return the first process in the list with a
matching command line. In cases where the PID space rolled over, this
could have been the forked off breakpad process and we'd wait on that
one. During the subsequent check that all processes are indeed gone, we
could then pick up the original Impala daemon that had forked off to
write the minidump and was still in the process of shutting down.

To fix this, we wait for every process twice. Processes are identified
by their command and iterating through them twice makes sure we catch
both the original daemon and it's breakpad child.

This change also contains improvements to the logging of processes in
our tests. This should make it easier to identify similar issues in the
future.

Testing: I ran the breakpad tests in exhaustive mode. I didn't try to
exercise it around a PID roll-over, but we shouldn't see the issue in
IMPALA-8191 again.

Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Reviewed-on: http://gerrit.cloudera.org:8080/12501
Reviewed-by: Lars Volker <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> TestBreakpadExhaustive.test_minidump_creation fails to kill cluster
> -------------------------------------------------------------------
>
>                 Key: IMPALA-8191
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8191
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 3.2.0
>            Reporter: Andrew Sherman
>            Assignee: Lars Volker
>            Priority: Critical
>              Labels: breakpad, broken-build, flaky-test
>             Fix For: Impala 3.2.0
>
>
> h3. Error Message
> {quote}
> assert not [<tests.common.impala_cluster.ImpaladProcess object at 0x5d20fd0>, 
> <tests.common.impala_cluster.ImpaladProcess object at 0x5d24ad0>] + where 
> [<tests.common.impala_cluster.ImpaladProcess object at 0x5d20fd0>, 
> <tests.common.impala_cluster.ImpaladProcess object at 0x5d24ad0>] = 
> <tests.common.impala_cluster.ImpalaCluster object at 0x5d20950>.impalads + 
> where <tests.common.impala_cluster.ImpalaCluster object at 0x5d20950> = 
> <test_breakpad.TestBreakpadExhaustive object at 0x7fb47c772690>.cluster
> {quote}
> h3. Stacktrace
> {quote}
> custom_cluster/test_breakpad.py:183: in test_minidump_creation 
> self.kill_cluster(SIGSEGV) custom_cluster/test_breakpad.py:81: in 
> kill_cluster signal is SIGUSR1 or self.assert_all_processes_killed() 
> custom_cluster/test_breakpad.py:121: in assert_all_processes_killed assert 
> not self.cluster.impalads E assert not 
> [<tests.common.impala_cluster.ImpaladProcess object at 0x5d20fd0>, 
> <tests.common.impala_cluster.ImpaladProcess object at 0x5d24ad0>] E + where 
> [<tests.common.impala_cluster.ImpaladProcess object at 0x5d20fd0>, 
> <tests.common.impala_cluster.ImpaladProcess object at 0x5d24ad0>] = 
> <tests.common.impala_cluster.ImpalaCluster object at 0x5d20950>.impalads E + 
> where <tests.common.impala_cluster.ImpalaCluster object at 0x5d20950> = 
> <test_breakpad.TestBreakpadExhaustive object at 0x7fb47c772690>.cluster
> {quote}
> See [IMPALA-8114] for a similar bug



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to