[jira] [Work logged] (BEAM-14334) Avoid using forkEvery in Spark runner tests

ASF GitHub Bot (Jira) Fri, 13 May 2022 07:25:08 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-14334?focusedWorklogId=770208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-770208
 ]


ASF GitHub Bot logged work on BEAM-14334:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/May/22 14:24
            Start Date: 13/May/22 14:24
    Worklog Time Spent: 10m 
      Work Description: mosche commented on code in PR #17662:
URL: https://github.com/apache/beam/pull/17662#discussion_r872459752


##########
runners/spark/spark_runner.gradle:
##########
@@ -218,29 +218,17 @@ def validatesRunnerBatch = 
tasks.register("validatesRunnerBatch", Test) {
   group = "Verification"
   // Disable gradle cache
   outputs.upToDateWhen { false }
-  def pipelineOptions = JsonOutput.toJson([
-    "--runner=TestSparkRunner",
-    "--streaming=false",
-    "--enableSparkMetricSinks=false",
-  ])
-  systemProperty "beamTestPipelineOptions", pipelineOptions
-  systemProperty "beam.spark.test.reuseSparkContext", "true"
-  systemProperty "spark.ui.enabled", "false"
-  systemProperty "spark.ui.showConsoleProgress", "false"
+  systemProperties sparkTestProperties(["--enableSparkMetricSinks":"false"])
 
   classpath = configurations.validatesRunner
   testClassesDirs = files(
     project(":sdks:java:core").sourceSets.test.output.classesDirs,
     project(":runners:core-java").sourceSets.test.output.classesDirs,
   )
-  testClassesDirs += files(project.sourceSets.test.output.classesDirs)

Review Comment:
   All unit tests should be run as part of the `test` task



##########
runners/spark/spark_runner.gradle:
##########
@@ -218,29 +218,17 @@ def validatesRunnerBatch = 
tasks.register("validatesRunnerBatch", Test) {
   group = "Verification"
   // Disable gradle cache
   outputs.upToDateWhen { false }
-  def pipelineOptions = JsonOutput.toJson([
-    "--runner=TestSparkRunner",
-    "--streaming=false",
-    "--enableSparkMetricSinks=false",
-  ])
-  systemProperty "beamTestPipelineOptions", pipelineOptions
-  systemProperty "beam.spark.test.reuseSparkContext", "true"
-  systemProperty "spark.ui.enabled", "false"
-  systemProperty "spark.ui.showConsoleProgress", "false"
+  systemProperties sparkTestProperties(["--enableSparkMetricSinks":"false"])
 
   classpath = configurations.validatesRunner
   testClassesDirs = files(
     project(":sdks:java:core").sourceSets.test.output.classesDirs,
     project(":runners:core-java").sourceSets.test.output.classesDirs,
   )
-  testClassesDirs += files(project.sourceSets.test.output.classesDirs)
 
-  // Only one SparkContext may be running in a JVM (SPARK-2243)
-  forkEvery 1
   maxParallelForks 4
   useJUnit {
     includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner'
-    includeCategories 'org.apache.beam.runners.spark.UsesCheckpointRecovery'

Review Comment:
   Unit test!





Issue Time Tracking
-------------------

    Worklog Id:     (was: 770208)
    Time Spent: 9h 20m  (was: 9h 10m)

> Avoid using forkEvery in Spark runner tests
> -------------------------------------------
>
>                 Key: BEAM-14334
>                 URL: https://issues.apache.org/jira/browse/BEAM-14334
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark, testing
>            Reporter: Moritz Mack
>            Assignee: Moritz Mack
>            Priority: P2
>          Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Usage of *{color:#FF0000}forkEvery 1{color}* is typically a strong sign of 
> poor quality / bad code and should be avoided: 
>  * It significantly impacts performance when running tests.
>  * And it often hides resource leaks, either in code or worse in the runner 
> itself.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (BEAM-14334) Avoid using forkEvery in Spark runner tests

Reply via email to