udaysagar2177 commented on issue #17331: URL: https://github.com/apache/pinot/issues/17331#issuecomment-3634601403
The micro-batch ingestion interval can be as low as 5 to 10 seconds. Such low interval is mainly for durability and high data volume reasons. The idea is to reduce Kafka costs and complexity while still achieving near-real-time ingestion with minimal overhead. Using a minion task to merge many small segments at such a frequency seems feasible, but it requires upstream processes to generate those small segments whether via a Spark job converting Avro files or the application itself. Both approaches introduce non-trivial work, and there’s a risk that segments could accumulate faster than the minions can merge them. With a real-time micro-batch ingestion approach, we could still leverage Kafka for descriptor records while producing larger segments, similar to the “one Kafka message per event” model. During periods of higher load, micro-batches reside on durable storage until the pipeline scales and Kafka topic artitions are re-assigned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
