Good find. I think TestPipelineOptions is from very early days. It makes
sense to me that these are all obsolete. Some guesses, though I haven't dug
through commit history to confirm:

 - TempRoot: a while ago TempLocation was optional, so I think this would
provide a default for things like gcpTempLocation and stagingLocation
 - OnSuccessMatcher: for runners where pipeline used to not terminate in
streaming mode. Now I think every runner can successfully waitUntilFinish.
Also the current API for waitUntilFinish went through some evolutions
around asynchrony so it wasn't always a good choice.
 - OnCreateMatcher: just for symmetry? I don't know
 - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish issue

Kenn

On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette <[email protected]> wrote:

> I recently came across TestPipelineOptions, and now I'm wondering if maybe
> it should be deprecated. It only seems to actually be supported for Spark
> and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it
> may make more sense to move the functionality it provides into the tests
> that need it.
>
> TestPipelineOptions
> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java>
> currently has four attributes:
>
> # TempRoot
> It's purpose isn't documented, but many tests read TempRoot and use it to
> set a TempLocation (example
> <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java#L124>).
> I think this attribute makes sense (e.g. we can set TempRoot once and each
> test has its own subdirectory), but I'm not sure. Can anyone confirm the
> motivation for it? I'd like to at least add a docstring for it.
>
> # OnCreateMatcher
> A way to register a matcher that will be checked right after a pipeline
> has started. It's never set except for in TestDataflowRunnerTest, so I
> think this is absolutely safe to remove.
>
> # OnSuccessMatcher
> A way to register a matcher that will be checked right after a pipeline
> has successfully completed. This is used in several tests (
> RequiresStableInputIT
> <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/sdks/java/core/src/test/java/org/apache/beam/sdk/RequiresStableInputIT.java#L140-L145>,
> WordCountIT
> <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java#L66-L67>,
> ... 8 total occurrences), but I don't see why they couldn't all be replaced
> with a `p.run().waitUntilFinish()`, followed by an assert.
>
> I think the current approach is actually dangerous, because running these
> tests with runners other than TestDataflowRunner or TestSparkRunner means
> the matchers are never actually checked. This is actually how I came across
> TestPipelineOptions - I tried running a test with the DirectRunner and
> couldn't make it fail.
>
> # TestTimeoutSeconds
> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used
> in one place
> <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java#L116>.
> I think it would be cleaner for the test to be responsible for calling
> waitUntilFinish (which we do elsewhere), the only drawback is it requires a
> small refactor so the test has access to the PipelineResult object.
>
>
> So I have a couple of questions for the community
> 1) Are there thoughts on TempRoot? Can we get rid of it?
> 2) Are there any objections to removing the other three attributes? Am I
> missing something? Unless there are any objections I think I'll write a
> patch to remove them.
>
> Thanks,
> Brian
>

Reply via email to