[
https://issues.apache.org/jira/browse/BEAM-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678388#comment-16678388
]
Niel Markwick commented on BEAM-4796:
-------------------------------------
So I tested this on DataFlow with a very simple pubsub -> mutation ->
SpannerIO.Write() pipeline, and it worked...
The [unit tests also test
streaming|https://github.com/apache/beam/blob/ba5bc60c7da3693a076344d47ffa4629cd696768/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L283]
and pass:
What sort of windowing are you applying on your inputs?
> SpannerIO waits for all input before writing
> --------------------------------------------
>
> Key: BEAM-4796
> URL: https://issues.apache.org/jira/browse/BEAM-4796
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0
> Reporter: Niel Markwick
> Assignee: Niel Markwick
> Priority: Major
> Fix For: 2.9.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> SpannerIO.Write waits for all input in the window to arrive before getting
> the schema:
> [https://github.com/apache/beam/blame/release-2.5.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java#L841]
>
> In streaming mode, this is not an issue, but in batch mode, this causes the
> pipeline to stall until all input is read, which could be a significant
> amount of time (and temp data).
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)