Ning Kang created BEAM-11906:
--------------------------------
Summary: No trigger early repeatedly for session windows
Key: BEAM-11906
URL: https://issues.apache.org/jira/browse/BEAM-11906
Project: Beam
Issue Type: Improvement
Components: runner-dataflow
Affects Versions: 2.28.0, 2.23.0
Reporter: Ning Kang
Originated from:
https://stackoverflow.com/questions/66381608/apache-beam-does-not-trigger-early-repeatedly-for-session-windows-on-google-data
The following pipeline fires early after each element when running locally
using DirectRunner, but there are no early triggers when running on google
cloud dataflow. On dataflow it triggers only after the session window has
closed.
{code:python}
( p
| 'read' >> beam.io.ReadFromPubSub(subscription =
'projects/xxx/subscriptions/xxx-sub')
| 'json' >> beam.Map(lambda x: json.loads(x.decode('utf-8')))
| 'kv' >> beam.Map(lambda x: (x['id'], x['amount']))
| 'window' >> beam.WindowInto(window.Sessions(15*60),
trigger=trigger.Repeatedly(trigger.AfterCount(1)),
accumulation_mode=AccumulationMode.ACCUMULATING)
| 'group' >> beam.GroupByKey()
| 'log' >> beam.Map(lambda x: logging.info(x))
)
{code}
Apache Beam versions tried: 2.23 and 2.28.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)