[
https://issues.apache.org/jira/browse/BEAM-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786367#comment-16786367
]
Kenneth Knowles commented on BEAM-6257:
---------------------------------------
OK now I'm watching this issue :-)
Generally, each PAssert invocation uses side inputs because it is an easy way
to gather up all the contents of a PCollection. Side inputs are an easy way to
convert a PCollection<KV<K, V>> into a map or multimap, or a PCollection<T>
into a T. But the "new" approach that I added to a couple of methods is to GBK
instead of side inputs, since most runners will have GBK support right away. To
match the semantics, you need to GBK:
- one key
- global window
- "never" trigger (aka trigger exactly once at the very last moment)
And then it will need a DoFn to convert the one KV<K, Iterable<T>> to
singleton, map, multimap and run the assertion.
> Can we deprecate the side input paths through PAssert?
> ------------------------------------------------------
>
> Key: BEAM-6257
> URL: https://issues.apache.org/jira/browse/BEAM-6257
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Kenneth Knowles
> Priority: Major
> Labels: starter, triaged
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger,
> and one uses side inputs. Side inputs are usually a later addition to a
> runner, while GBK is one of the first primitives (with a single firing it is
> even simple). Filing this against myself to figure out why the side input
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input
> version of PAssert should be fairly easy. You might need help but can ask on
> dev@
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)