udaysagar2177 commented on issue #17331:
URL: https://github.com/apache/pinot/issues/17331#issuecomment-3634601403

   The micro-batch ingestion interval can be as low as 5 to 10 seconds. Such 
low interval is mainly for durability and high data volume reasons. The idea is 
to reduce Kafka costs and complexity while still achieving near-real-time 
ingestion with minimal overhead.
   
   Using a minion task to merge many small segments at such a frequency seems 
feasible, but it requires upstream processes to generate those small segments 
whether via a Spark job converting Avro files or the application itself. Both 
approaches introduce non-trivial work, and there’s a risk that segments could 
accumulate faster than the minions can merge them.
   
   With a real-time micro-batch ingestion approach, we could still leverage 
Kafka for descriptor records while producing larger segments, similar to the 
“one Kafka message per event” model. During periods of higher load, 
micro-batches reside on durable storage until the pipeline scales and Kafka 
topic artitions are re-assigned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to