[
https://issues.apache.org/jira/browse/BEAM-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892900#comment-16892900
]
Ryan Skraba commented on BEAM-6883:
-----------------------------------
More bad news -- it looks like the StreamingSourceMetricsTests is not testing
an UnboundedSource :(
By adding a breakpoint, you can see that it's producing a very-long-lived
PCollection, but it is IsBounded.BOUNDED -- if we were let it run, it's a batch
pipeline in the end. Watermarks are never advanced or calculated, and there's
nothing to indicate that the test should stop until the TestPipelineOptions
timeout occurs.
A couple of options:
1) Use GenerateSequence in a truly unbounded mode and test that the PCollection
is UNBOUNDED by removing the `to()` and `withMaxReadTime()` configuration. Add
a time function to make sure that the watermark advances to the end after 1000
elements. Due to the nature of GenerateSequence, you might see more than 1000
elements (we removed the `to()`) but probably not exactly 1000 -- the assertion
could be fixed, I suppose.
2) Use CreateStream or implement TestStream in SparkRunner, and have them
generate Read metrics.
Both of those would finish in under 10 seconds and test the intended
functionality -- as it is, this test isn't doing anything useful.
> StreamingSourceMetricsTest takes too long to finish
> ---------------------------------------------------
>
> Key: BEAM-6883
> URL: https://issues.apache.org/jira/browse/BEAM-6883
> Project: Beam
> Issue Type: Test
> Components: runner-spark
> Affects Versions: 2.11.0
> Reporter: Ismaël Mejía
> Assignee: Alexey Romanenko
> Priority: Minor
>
> This test is part of Spark's ValidatesRunner suite and it takes more than 10
> minutes to end.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)