[
https://issues.apache.org/jira/browse/BEAM-14334?focusedWorklogId=759780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-759780
]
ASF GitHub Bot logged work on BEAM-14334:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Apr/22 06:36
Start Date: 21/Apr/22 06:36
Worklog Time Spent: 10m
Work Description: mosche commented on PR #17406:
URL: https://github.com/apache/beam/pull/17406#issuecomment-1104770253
Hmm, I noticed a related issue here. `SparkContextOptions` doesn't work with
`TestPipeline` because `providedSparkContext` is ignored during the serde
roundtrip to test that everything can be serialized before actually running the
pipeline :/
IMHO `providedSparkContext` really doesn't belong into PipelineOptions, it
can't be serialized and the resulting behavior is very inconsistent... tough
that would be a breaking change. I suggest adding methods to set the provided
Spark context to `SparkContextFactory`. If a context is provided using
`SparkContextOptions`, it will be stored in the factory using
`setProvidedSparkContext` as well.
This also allows to clear the provided Spark context as well, allowing for
much cleaner code.
```java
/**
* Set an externally managed {@link JavaSparkContext} that will be used if
{@link
* SparkContextOptions#getUsesProvidedSparkContext()} is set to {@code
true}.
*
* <p>A Spark context can also be provided using {@link
* SparkContextOptions#setProvidedSparkContext(JavaSparkContext)}.
However, it will be dropped
* during serialization potentially leading to confusing behavior. This is
particularly the case
* when used in tests with {@link
org.apache.beam.sdk.testing.TestPipeline}.
*/
public static synchronized void setProvidedSparkContext(JavaSparkContext
providedSparkContext)
public static synchronized void clearProvidedSparkContext()
```
Issue Time Tracking
-------------------
Worklog Id: (was: 759780)
Time Spent: 20m (was: 10m)
> Avoid using forkEvery in Spark runner tests
> -------------------------------------------
>
> Key: BEAM-14334
> URL: https://issues.apache.org/jira/browse/BEAM-14334
> Project: Beam
> Issue Type: Improvement
> Components: runner-spark, testing
> Reporter: Moritz Mack
> Assignee: Moritz Mack
> Priority: P2
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Usage of *{color:#FF0000}forkEvery 1{color}* is typically a strong sign of
> poor quality / bad code and should be avoided:
> * It significantly impacts performance when running tests.
> * And it often hides resource leaks, either in code or worse in the runner
> itself.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)