+1 to all of these are probably obsolete at this point and would be nice to remove.
On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles <k...@apache.org> wrote: > > Good find. I think TestPipelineOptions is from very early days. It makes > sense to me that these are all obsolete. Some guesses, though I haven't dug > through commit history to confirm: > > - TempRoot: a while ago TempLocation was optional, so I think this would > provide a default for things like gcpTempLocation and stagingLocation > - OnSuccessMatcher: for runners where pipeline used to not terminate in > streaming mode. Now I think every runner can successfully waitUntilFinish. > Also the current API for waitUntilFinish went through some evolutions around > asynchrony so it wasn't always a good choice. > - OnCreateMatcher: just for symmetry? I don't know > - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish issue > > Kenn > > On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette <bhule...@google.com> wrote: >> >> I recently came across TestPipelineOptions, and now I'm wondering if maybe >> it should be deprecated. It only seems to actually be supported for Spark >> and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it >> may make more sense to move the functionality it provides into the tests >> that need it. >> >> TestPipelineOptions currently has four attributes: >> >> # TempRoot >> It's purpose isn't documented, but many tests read TempRoot and use it to >> set a TempLocation (example). I think this attribute makes sense (e.g. we >> can set TempRoot once and each test has its own subdirectory), but I'm not >> sure. Can anyone confirm the motivation for it? I'd like to at least add a >> docstring for it. >> >> # OnCreateMatcher >> A way to register a matcher that will be checked right after a pipeline has >> started. It's never set except for in TestDataflowRunnerTest, so I think >> this is absolutely safe to remove. >> >> # OnSuccessMatcher >> A way to register a matcher that will be checked right after a pipeline has >> successfully completed. This is used in several tests >> (RequiresStableInputIT, WordCountIT, ... 8 total occurrences), but I don't >> see why they couldn't all be replaced with a `p.run().waitUntilFinish()`, >> followed by an assert. >> >> I think the current approach is actually dangerous, because running these >> tests with runners other than TestDataflowRunner or TestSparkRunner means >> the matchers are never actually checked. This is actually how I came across >> TestPipelineOptions - I tried running a test with the DirectRunner and >> couldn't make it fail. >> >> # TestTimeoutSeconds >> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used in >> one place. I think it would be cleaner for the test to be responsible for >> calling waitUntilFinish (which we do elsewhere), the only drawback is it >> requires a small refactor so the test has access to the PipelineResult >> object. >> >> >> So I have a couple of questions for the community >> 1) Are there thoughts on TempRoot? Can we get rid of it? >> 2) Are there any objections to removing the other three attributes? Am I >> missing something? Unless there are any objections I think I'll write a >> patch to remove them. >> >> Thanks, >> Brian