[ 
https://issues.apache.org/jira/browse/BEAM-11483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313471#comment-17313471
 ] 

Kyle Weaver edited comment on BEAM-11483 at 4/1/21, 11:16 PM:
--------------------------------------------------------------

[~thiscensustaker] For more clarity, these test failures are in the Spark 
portable streaming runner, which is tested by the Jenkins job 
beam_PostCommit_Java_PVR_Spark_Streaming [0]. The Spark portable streaming 
runner has been basically neglected for a while, which is why these regressions 
snuck in. This runner is known to have important missing functionality, but 
ideally the tests should at least make it clear what functionality is really 
missing (as opposed to just failing due to test setup issues, etc.). It looks 
like the test suite last passed Sep 10, 2020 and has been failing ever since 
[1].

I tried running the tests locally. The commands are "./gradlew 
:runners:spark:2:job-server:validatesPortableRunnerStreaming" and "./gradlew 
:runners:spark:3:job-server:validatesPortableRunnerStreaming" for Spark 
versions 2 and 3, respectively. The only tests that failed for me were 
GroupByKeyTest$WindowTests. I wrote a PR to exclude those [2].

So why are so many tests flaking on Jenkins then? I’m not sure, but previously 
we had a problem with running Spark 2 and 3 together in a different test suite 
[3], so there may be a similar problem here. The simplest workaround would be 
to separate Spark 2 and 3 into separate test suites and see if they pass. The 
job is defined in [4].

[0] https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Streaming

[1] https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Streaming/75/

[2] [https://github.com/apache/beam/pull/14405]

[3] BEAM-11992

[4] 
https://github.com/apache/beam/blob/44b7a87c5009315570864036baba27a303ca5eff/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Streaming.groovy#L39-L40


was (Author: ibzib):
[~thiscensustaker] For more clarity, these test failures are in the Spark 
portable streaming runner, which is tested by the Jenkins job 
beam_PostCommit_Java_PVR_Spark_Streaming [0]. The Spark portable streaming 
runner has been basically neglected for a while, which is why these regressions 
snuck in. This runner is known to have important missing functionality, but 
ideally the tests should at least make it clear what functionality is really 
missing (as opposed to just failing due to test setup issues, etc.). It looks 
like the test suite last passed Sep 10, 2020 and has been failing ever since 
[1].

I tried running the tests locally. The commands are "./gradlew 
:runners:spark:2:job-server:validatesPortableRunnerStreaming" and "./gradlew 
:runners:spark:3:job-server:validatesPortableRunnerStreaming" for Spark 
versions 2 and 3, respectively. The only tests that failed for me were 
GroupByKeyTest$WindowTests. I wrote a PR to exclude those [2].

So why are so many tests flaking on Jenkins then? I’m not sure, but previously 
we had a problem with running Spark 2 and 3 together in a different test suite 
[3], so there may be a similar problem here. The simplest workaround would be 
to separate Spark 2 and 3 into separate test suites and see if they pass. The 
job is defined in [4].

[0] 
[https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Streaming|about:blank]

[1] 
[https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Streaming/75/|about:blank]

[2] [https://github.com/apache/beam/pull/14405]

[3] BEAM-11992

[4] 
[https://github.com/apache/beam/blob/44b7a87c5009315570864036baba27a303ca5eff/.test-infra/jenkins/job_PostCommit_Java_PortableValidatesRunner_Spark_Streaming.groovy#L39-L40|about:blank]

> Spark PostCommit Test Improvements
> ----------------------------------
>
>                 Key: BEAM-11483
>                 URL: https://issues.apache.org/jira/browse/BEAM-11483
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark, test-failures
>            Reporter: Tyson Hamilton
>            Assignee: Fernando Morales
>            Priority: P1
>              Labels: flake, portability-spark
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Master bug for a group of the top failing Spark postcommit tests as of 12/17.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to