mosche commented on PR #17406:
URL: https://github.com/apache/beam/pull/17406#issuecomment-1104770253

   Hmm, I noticed a related issue here. `SparkContextOptions` doesn't work with 
`TestPipeline` because `providedSparkContext` is ignored during the serde 
roundtrip to test that everything can be serialized before actually running the 
pipeline :/
   
   IMHO `providedSparkContext` really doesn't belong into PipelineOptions, it 
can't be serialized and the resulting behavior is very inconsistent... tough 
that would be a breaking change. I suggest adding methods to set the provided 
Spark context to `SparkContextFactory`.  If a context is provided using 
`SparkContextOptions`, it will be stored in the factory using 
`setProvidedSparkContext` as well.
   
   This also allows to clear the provided Spark context as well, allowing for 
much cleaner code.
   
   ```java
     /**
      * Set an externally managed {@link JavaSparkContext} that will be used if 
{@link
      * SparkContextOptions#getUsesProvidedSparkContext()} is set to {@code 
true}.
      *
      * <p>A Spark context can also be provided using {@link
      * SparkContextOptions#setProvidedSparkContext(JavaSparkContext)}. 
However, it will be dropped
      * during serialization potentially leading to confusing behavior. This is 
particularly the case
      * when used in tests with {@link 
org.apache.beam.sdk.testing.TestPipeline}.
      */
     public static synchronized void setProvidedSparkContext(JavaSparkContext 
providedSparkContext) 
   
     public static synchronized void clearProvidedSparkContext()
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to