Etienne Chauchot created BEAM-4825:
--------------------------------------

             Summary: Nexmark query3 is flaky on Direct runner
                 Key: BEAM-4825
                 URL: https://issues.apache.org/jira/browse/BEAM-4825
             Project: Beam
          Issue Type: Bug
          Components: runner-direct
            Reporter: Etienne Chauchot
            Assignee: Henning Rohde


Query3 exercises state and timers. It asks this question to Nexmark auction 
system:
Who is selling in particular US states?

And the sketch of its code is:
 * Apply global window to events with trigger repeatedly after at least 
nbEvents in pane =>  results will be materialized each time nbEvents are 
received.
 * input1: collection of auctions events filtered by category and keyed by 
seller id
 * input2: collection of persons events filtered by US state codes and keyed by 
person id
 * CoGroupByKey to group auctions and persons by personId/sellerId + tags to 
distinguish persons and auctions
 * ParDo to do the incremental join: auctions and person events can arrive out 
of order
 * person element stored in persistent state in order to match future auctions 
by that person. Set a timer to clear the person state after a TTL
 * auction elements stored in persistent state until we have seen the 
corresponding person record. Then, it can be output and cleared


 * output NameCityStateId(person.name, person.city, person.state, auction.id) 
objects
 
 
*The output size should be constant and it is not.*
*The output size of this query is different between batch and streaming modes*
 
See query 3 dashboard in this graph: 
https://apache-beam-testing.appspot.com/explore?dashboard=5099379773931520



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to