+1 to all of these are probably obsolete at this point and would be
nice to remove.


On Wed, Nov 6, 2019 at 3:00 PM Kenneth Knowles <k...@apache.org> wrote:
>
> Good find. I think TestPipelineOptions is from very early days. It makes 
> sense to me that these are all obsolete. Some guesses, though I haven't dug 
> through commit history to confirm:
>
>  - TempRoot: a while ago TempLocation was optional, so I think this would 
> provide a default for things like gcpTempLocation and stagingLocation
>  - OnSuccessMatcher: for runners where pipeline used to not terminate in 
> streaming mode. Now I think every runner can successfully waitUntilFinish. 
> Also the current API for waitUntilFinish went through some evolutions around 
> asynchrony so it wasn't always a good choice.
>  - OnCreateMatcher: just for symmetry? I don't know
>  - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish issue
>
> Kenn
>
> On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette <bhule...@google.com> wrote:
>>
>> I recently came across TestPipelineOptions, and now I'm wondering if maybe 
>> it should be deprecated. It only seems to actually be supported for Spark 
>> and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it 
>> may make more sense to move the functionality it provides into the tests 
>> that need it.
>>
>> TestPipelineOptions currently has four attributes:
>>
>> # TempRoot
>> It's purpose isn't documented, but many tests read TempRoot and use it to 
>> set a TempLocation (example). I think this attribute makes sense (e.g. we 
>> can set TempRoot once and each test has its own subdirectory), but I'm not 
>> sure. Can anyone confirm the motivation for it? I'd like to at least add a 
>> docstring for it.
>>
>> # OnCreateMatcher
>> A way to register a matcher that will be checked right after a pipeline has 
>> started. It's never set except for in TestDataflowRunnerTest, so I think 
>> this is absolutely safe to remove.
>>
>> # OnSuccessMatcher
>> A way to register a matcher that will be checked right after a pipeline has 
>> successfully completed. This is used in several tests 
>> (RequiresStableInputIT, WordCountIT, ... 8 total occurrences), but I don't 
>> see why they couldn't all be replaced with a `p.run().waitUntilFinish()`, 
>> followed by an assert.
>>
>> I think the current approach is actually dangerous, because running these 
>> tests with runners other than TestDataflowRunner or TestSparkRunner means 
>> the matchers are never actually checked. This is actually how I came across 
>> TestPipelineOptions - I tried running a test with the DirectRunner and 
>> couldn't make it fail.
>>
>> # TestTimeoutSeconds
>> Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used in 
>> one place. I think it would be cleaner for the test to be responsible for 
>> calling waitUntilFinish (which we do elsewhere), the only drawback is it 
>> requires a small refactor so the test has access to the PipelineResult 
>> object.
>>
>>
>> So I have a couple of questions for the community
>> 1) Are there thoughts on TempRoot? Can we get rid of it?
>> 2) Are there any objections to removing the other three attributes? Am I 
>> missing something? Unless there are any objections I think I'll write a 
>> patch to remove them.
>>
>> Thanks,
>> Brian

Reply via email to