[ 
https://issues.apache.org/jira/browse/BEAM-6257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786367#comment-16786367
 ] 

Kenneth Knowles commented on BEAM-6257:
---------------------------------------

OK now I'm watching this issue :-)

Generally, each PAssert invocation uses side inputs because it is an easy way 
to gather up all the contents of a PCollection. Side inputs are an easy way to 
convert a PCollection<KV<K, V>> into a map or multimap, or a PCollection<T> 
into a T. But the "new" approach that I added to a couple of methods is to GBK 
instead of side inputs, since most runners will have GBK support right away. To 
match the semantics, you need to GBK:

 - one key
 - global window
 - "never" trigger (aka trigger exactly once at the very last moment)

And then it will need a DoFn to convert the one KV<K, Iterable<T>> to 
singleton, map, multimap and run the assertion.

> Can we deprecate the side input paths through PAssert?
> ------------------------------------------------------
>
>                 Key: BEAM-6257
>                 URL: https://issues.apache.org/jira/browse/BEAM-6257
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Kenneth Knowles
>            Priority: Major
>              Labels: starter, triaged
>
> PAssert has two distinct paths - one uses GBK with a single-firing trigger, 
> and one uses side inputs. Side inputs are usually a later addition to a 
> runner, while GBK is one of the first primitives (with a single firing it is 
> even simple). Filing this against myself to figure out why the side input 
> version is not deprecated, and if it can be deprecated.
> Marking this as a "starter" task because finding and eliminating side input 
> version of PAssert should be fairly easy. You might need help but can ask on 
> dev@



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to