[jira] [Work logged] (BEAM-14334) Avoid using forkEvery in Spark runner tests

ASF GitHub Bot (Jira) Wed, 20 Apr 2022 23:37:07 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-14334?focusedWorklogId=759780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759780
 ]


ASF GitHub Bot logged work on BEAM-14334:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Apr/22 06:36
            Start Date: 21/Apr/22 06:36
    Worklog Time Spent: 10m 
      Work Description: mosche commented on PR #17406:
URL: https://github.com/apache/beam/pull/17406#issuecomment-1104770253

   Hmm, I noticed a related issue here. `SparkContextOptions` doesn't work with 
`TestPipeline` because `providedSparkContext` is ignored during the serde 
roundtrip to test that everything can be serialized before actually running the 
pipeline :/
   
   IMHO `providedSparkContext` really doesn't belong into PipelineOptions, it 
can't be serialized and the resulting behavior is very inconsistent... tough 
that would be a breaking change. I suggest adding methods to set the provided 
Spark context to `SparkContextFactory`.  If a context is provided using 
`SparkContextOptions`, it will be stored in the factory using 
`setProvidedSparkContext` as well.
   
   This also allows to clear the provided Spark context as well, allowing for 
much cleaner code.
   
   ```java
     /**
      * Set an externally managed {@link JavaSparkContext} that will be used if 
{@link
      * SparkContextOptions#getUsesProvidedSparkContext()} is set to {@code 
true}.
      *
      * <p>A Spark context can also be provided using {@link
      * SparkContextOptions#setProvidedSparkContext(JavaSparkContext)}. 
However, it will be dropped
      * during serialization potentially leading to confusing behavior. This is 
particularly the case
      * when used in tests with {@link 
org.apache.beam.sdk.testing.TestPipeline}.
      */
     public static synchronized void setProvidedSparkContext(JavaSparkContext 
providedSparkContext) 
   
     public static synchronized void clearProvidedSparkContext()
   ```
   




Issue Time Tracking
-------------------

    Worklog Id:     (was: 759780)
    Time Spent: 20m  (was: 10m)

> Avoid using forkEvery in Spark runner tests
> -------------------------------------------
>
>                 Key: BEAM-14334
>                 URL: https://issues.apache.org/jira/browse/BEAM-14334
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark, testing
>            Reporter: Moritz Mack
>            Assignee: Moritz Mack
>            Priority: P2
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Usage of *{color:#FF0000}forkEvery 1{color}* is typically a strong sign of 
> poor quality / bad code and should be avoided: 
>  * It significantly impacts performance when running tests.
>  * And it often hides resource leaks, either in code or worse in the runner 
> itself.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (BEAM-14334) Avoid using forkEvery in Spark runner tests

Reply via email to