[ 
https://issues.apache.org/jira/browse/GEODE-9615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545161#comment-17545161
 ] 

ASF subversion and git services commented on GEODE-9615:
--------------------------------------------------------

Commit 6a0e744f1cbcca75c2a5a5b6465f010a3f135a8c in geode's branch 
refs/heads/support/1.15 from Kirk Lund
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=6a0e744f1c ]

GEODE-10327: Overhaul GfshRule to kill processes and save artifacts (#7731)

PROBLEM

Tests that use GfshRule leave behind orphaned processes and do not save
artifacts for debugging failures.

SOLUTION

GfshRule needs to cleanup all processes it forks. It also needs to save
off all runtime artifacts such as logging, stats, pid files, diskstores
to enable debugging of test failures.

DETAILS

Enhance GfshRule and modify all tests using it for proper debugging and
to prevent test pollution.

Overhaul of GfshRule:

* kill ALL geode processes during cleanup
* use FolderRule to ensure all logs and files are properly saved off
  when a test fails
* extract GfshExecutor from JUnit rule code
* GfshExecutor allows a test to use any number of Geode versions with
  just one GfshRule
* add Gfsh log level support for easier debugging
* add support for new VmConfiguration to allow control over Geode and
  Java versions
* overhaul API of GfshRule and companion classes for better consistency
  and design

New FolderRule:

* replaces TemporaryFolder and saves off all content when a test fails
* creates root directory under the gradle worker instead of under temp

Update HTTP session caching module tests:

* use new FolderRule to save all artifacts when a test fails
* use nio Paths for filesystem variables

Update acceptance and upgrade tests that use GfshRule:

* use new improved GfshRule and GfshExecutor
* use new FolderRule instead of TemporaryFolder to save all artifacts
  when a test fails
* use --disable-default-server in tests with no clients
* fix flakiness of many tests by using random ports instead of default
  or hardcoded port values
* reformat GfshRule API usage in tests to improve readability and
  consistency
* add GfshStopper to provide common place to await process stop (stop
  locator/server is async so restarting with same ports is very prone
  to hitting BindExceptions)

Update ProcessUtils:

* extract NativeProcessUtils and make it public for direct use
* rename InternalProcessUtils as ProcessUtilsProvider and move to its
  own class
* rethrow IOExceptions as UncheckedIOExceptions
* fix flakiness in NativeProcessUtilsTest by moving findAvailablePid
  into test method

Minor changes:

* improve code formatting and readability
* convert from old io File to nio Path APIs as much as possible
* close output streams to fix filesystem issues on Windows

Fixes flaky test tickets:

* DeployJarAcceptanceTest GEODE-9615
* possibly other tests that uses GfshRule

Changes for resubmit:

* log error message if unable to delete folder

NOTES

The jdk8, jdk17 and windows labels were used to run tests on more
environments.

This PR contains mostly test and framework changes. The only product
code altered is ServerLauncher and several classes in
org.apache.geode.internal.process, all of which is in geode-core.

(cherry picked from commit 3f8f8db595ca4b99b25fe4d109a8ed118a712701)


> CI Failure: Acceptance Tests fails with exit value 1 from start locator or 
> start server
> ---------------------------------------------------------------------------------------
>
>                 Key: GEODE-9615
>                 URL: https://issues.apache.org/jira/browse/GEODE-9615
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>            Reporter: Kirk Lund
>            Assignee: Kirk Lund
>            Priority: Major
>
> This failure occurs because the locator or server was stopped and then 
> immediately restarted with the same ports. When Gfsh returns from stop 
> locator or stop server, the stopped process is asynchronously stopping and 
> may continue to hold those ports when the next start command for that process 
> is issued. It then fails with an exit value of 1 instead of the expected 
> value of 0.
> Any test using GfshRule to stop and then immediately start a new process may 
> fail in this way. The underlying exception in the locator or server log is a 
> BindException because the port is still in use by the previous instance of 
> that process which is still in the process of stopping.
> The only way to close this gap is to have the test get the pid for the 
> process being stopped and then await until the process identified by that pid 
> no longer exists.
> {code:java}
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > onlineStatusCommandShouldSucceedWhenConnected_locator_host_and_port FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --host=localhost --port=20608]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_host_and_port(StatusLocatorExitCodeAcceptanceTest.java:128)
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > offlineStatusCommandShouldSucceedWhenConnected_locator_dir FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --dir=/tmp/junit11722670533134972918/member-controller/locator-chase-obedient-cake]]
>  expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.offlineStatusCommandShouldSucceedWhenConnected_locator_dir(StatusLocatorExitCodeAcceptanceTest.java:140)
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > onlineStatusCommandShouldSucceedWhenConnected_locator_name FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --name=locator-chase-obedient-cake]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_name(StatusLocatorExitCodeAcceptanceTest.java:116)
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest
>  > onlineStatusCommandShouldSucceedWhenConnected_locator_port FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [test-frame: gfsh -e connect --locator=localhost[20608] -e status locator 
> --port=20608]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:137)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:128)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:133)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.executeScriptWithExpectedExitCode(StatusLocatorExitCodeAcceptanceTest.java:255)
>         at 
> org.apache.geode.management.internal.cli.shell.StatusLocatorExitCodeAcceptanceTest.onlineStatusCommandShouldSucceedWhenConnected_locator_port(StatusLocatorExitCodeAcceptanceTest.java:122)
>  {code}
> {noformat}
> org.apache.geode.modules.DeployJarAcceptanceTest > classMethod FAILED
>     org.junit.ComparisonFailure: [Exit value from process started by 
> [41497e8cf7689a63: gfsh -e start locator --name=locator -e configure pdx 
> --read-serialized=true -e start server --name=server 
> --locators=localhost[10334]]] expected:<[0]> but was:<[1]>
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at 
> jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshExecution.awaitTermination(GfshExecution.java:103)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:143)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshRule.execute(GfshRule.java:152)
>         at 
> org.apache.geode.test.junit.rules.gfsh.GfshScript.execute(GfshScript.java:153)
>         at 
> org.apache.geode.modules.DeployJarAcceptanceTest.setup(DeployJarAcceptanceTest.java:62)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to