Just following up—please let me know if any of you have recommendations for 
implementing the mentioned use case.

Thanks.    On Tuesday, October 29, 2024 at 10:37:30 PM PDT, Anil Dasari 
<dasaria...@myyahoo.com> wrote:  
 
  Hi Venkat,Thanks for the reply. 
Microbatching is a data processing technique where small batches of data are 
collected and processed together at regular intervals.However, I'm aiming to 
avoid traditional micro-batch processing by tagging records within a time 
window as a batch, allowing for near-real-time data processing. I’m currently 
exploring Flink for the following use case:
1. Group data by a time window and write it to S3 under the appropriate 
prefix.2. Generate metrics for the microbatch and, if needed, store them in 
S3.3. Send metrics to an external system to notify that Step 1 has been 
completed.If any part of the process fails, the entire microbatch step should 
be rolled back. Planning to implement two phase commit sink for Step 2 and 3. 
The primary challenge is tagging the record set with epoch time across all 
tasks within a window to utilize it in the sink process for creating 
committable splits, such as the processing time in the flink file sink.
Thanks    On Tuesday, October 29, 2024 at 09:40:40 PM PDT, Venkatakrishnan 
Sowrirajan <vsowr...@asu.edu> wrote:  
 
 Can you share more details on what do you mean by micro-batching? Can you
explain with an example to understand it better?

Thanks
Venkat

On Tue, Oct 29, 2024, 1:22 PM Anil Dasari <dasaria...@myyahoo.com.invalid>
wrote:

> Hello team,
> I apologize for reaching out on the dev mailing list. I'm working on
> implementing micro-batching with near real-time processing.
> I've seen similar questions in the Flink Slack channel and user mailing
> list, but there hasn't been much discussion or feedback. Here are the
> options I've explored:
> 1. Windowing: This approach looked promising, but the flushing mechanism
> requires record-level information checks, as window data isn't accessible
> throughout the pipeline.
> 2. Window + Trigger: This method buffers events until the trigger interval
> is reached, which affects real-time processing; events are only processed
> when the trigger occurs.
> 3. Processing Time: The processing time is specific to each file writer,
> resulting in inconsistencies across different task managers.
> 4. Watermark: There’s no global watermark; it's specific to each source
> task, and the initial watermark information (before the first watermark
> event) isn't epoch-based.
> I'm looking to write data grouped by time (micro-batch time). What’s the
> best approach to achieve micro-batching in Flink?
> Let me know if you have any questions. thanks.
> Thanks.
>
>
    

Reply via email to