Hello team,
I apologize for reaching out on the dev mailing list. I'm working on
implementing micro-batching with near real-time processing.
I've seen similar questions in the Flink Slack channel and user mailing list,
but there hasn't been much discussion or feedback. Here are the options I've
explored:
1. Windowing: This approach looked promising, but the flushing mechanism
requires record-level information checks, as window data isn't accessible
throughout the pipeline.
2. Window + Trigger: This method buffers events until the trigger interval is
reached, which affects real-time processing; events are only processed when the
trigger occurs.
3. Processing Time: The processing time is specific to each file writer,
resulting in inconsistencies across different task managers.
4. Watermark: There’s no global watermark; it's specific to each source task,
and the initial watermark information (before the first watermark event) isn't
epoch-based.
I'm looking to write data grouped by time (micro-batch time). What’s the best
approach to achieve micro-batching in Flink?
Let me know if you have any questions. thanks.
Thanks.