mosche commented on PR #17406:
URL: https://github.com/apache/beam/pull/17406#issuecomment-1104770253
Hmm, I noticed a related issue here. `SparkContextOptions` doesn't work with
`TestPipeline` because `providedSparkContext` is ignored during the serde
roundtrip to test that everything can be serialized before actually running the
pipeline :/
IMHO `providedSparkContext` really doesn't belong into PipelineOptions, it
can't be serialized and the resulting behavior is very inconsistent... tough
that would be a breaking change. I suggest adding methods to set the provided
Spark context to `SparkContextFactory`. If a context is provided using
`SparkContextOptions`, it will be stored in the factory using
`setProvidedSparkContext` as well.
This also allows to clear the provided Spark context as well, allowing for
much cleaner code.
```java
/**
* Set an externally managed {@link JavaSparkContext} that will be used if
{@link
* SparkContextOptions#getUsesProvidedSparkContext()} is set to {@code
true}.
*
* <p>A Spark context can also be provided using {@link
* SparkContextOptions#setProvidedSparkContext(JavaSparkContext)}.
However, it will be dropped
* during serialization potentially leading to confusing behavior. This is
particularly the case
* when used in tests with {@link
org.apache.beam.sdk.testing.TestPipeline}.
*/
public static synchronized void setProvidedSparkContext(JavaSparkContext
providedSparkContext)
public static synchronized void clearProvidedSparkContext()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]