Etienne Chauchot created BEAM-4825:
--------------------------------------
Summary: Nexmark query3 is flaky on Direct runner
Key: BEAM-4825
URL: https://issues.apache.org/jira/browse/BEAM-4825
Project: Beam
Issue Type: Bug
Components: runner-direct
Reporter: Etienne Chauchot
Assignee: Henning Rohde
Query3 exercises state and timers. It asks this question to Nexmark auction
system:
Who is selling in particular US states?
And the sketch of its code is:
* Apply global window to events with trigger repeatedly after at least
nbEvents in pane => results will be materialized each time nbEvents are
received.
* input1: collection of auctions events filtered by category and keyed by
seller id
* input2: collection of persons events filtered by US state codes and keyed by
person id
* CoGroupByKey to group auctions and persons by personId/sellerId + tags to
distinguish persons and auctions
* ParDo to do the incremental join: auctions and person events can arrive out
of order
* person element stored in persistent state in order to match future auctions
by that person. Set a timer to clear the person state after a TTL
* auction elements stored in persistent state until we have seen the
corresponding person record. Then, it can be output and cleared
* output NameCityStateId(person.name, person.city, person.state, auction.id)
objects
*The output size should be constant and it is not.*
*The output size of this query is different between batch and streaming modes*
See query 3 dashboard in this graph:
https://apache-beam-testing.appspot.com/explore?dashboard=5099379773931520
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)