[ https://issues.apache.org/jira/browse/GEODE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291981#comment-17291981 ]
Hale Bales commented on GEODE-8950: ----------------------------------- - the first known CI failure was on 02/04/2021 - we do not have CI history before 02/01/2021 - these failures are occuring both in CI and when run using the scripts - the test that is failing was added in November of 2020 - running develop against 1.13.0 does not produce consistent benchmark results - running with a baseline of 1.13.1 does not improve the failure rate - running 1.13.0 against itself does not produce consistently passing results - running develop against itself does not produce consistently passing results - there have been no changes to benchmarks this year (as of feb 26, 2021) - there do not appear to be any suspect changes to geode core made this year - Jake Barrett, Donal Evans, and I have all looked at the commits - no commits are in the right area of the code - I have tested all code changes that even had the slightest chance of changing the performance in P2pPartitionedPutLongBenchmark - the changes to dependencies do not seem to have changed the performance - profiling the test for the following did not produce any useful information: - cpu usage - allocations - locks - looking at the gfs logs showed that (on a failing run): - develop did fewer puts than 1.13.0 - develop had less cpu activity - develop received fewer bytes - these results are expected for a run where develop had lower throughput than 1.13.0 - this benchmark has a very small payload size - in the past the performance team saw a high degree of sensitivity in tests with small payloads conclusions: - these failures do not appear to be caused by any code change - these failures do not appear to be caused by any benchmarking change - these failures do not appear to be caused by any dependency change - the instability when running the same version/commit against itself points to the issue being the overhead for each operation for such a small payload - there is no data to support that this failure is occuring more often than previously proposed next stepts: - keep running this test and keep track of the failure rate - if the failure rate increases, investigate the peer-to-peer code - if the failure rate stays the same, comment out the test - long term, invest time in a significant refactor of the peer-to-peer code > Benchmark failure - P2pPartitionedPutLongBenchmark > -------------------------------------------------- > > Key: GEODE-8950 > URL: https://issues.apache.org/jira/browse/GEODE-8950 > Project: Geode > Issue Type: Bug > Components: benchmarks > Affects Versions: 1.15.0 > Reporter: Donal Evans > Assignee: Hale Bales > Priority: Major > > Multiple benchmark failures due to P2pPartitionedPutLongBenchmark have been > seen recently. > This run saw 3 out of the 5 repeats fail due to flagged degradations in > P2pPartitionedPutLongBenchmark: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/16|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/16#L601ed52d:5552] > This run saw 1 out of the 5 repeats fail due to flagged degradations in > P2pPartitionedPutLongBenchmark: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/20] > This run saw 4 out of the 5 repeats fail due to flagged degradations in > P2pPartitionedPutLongBenchmark: > [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/27] > In all the above benchmarks, the other failed runs were due to EOFExceptions > rather than flagged degradations. -- This message was sent by Atlassian Jira (v8.3.4#803005)