ankitsultana opened a new issue, #10243: URL: https://github.com/apache/pinot/issues/10243
At present we make a `PinotFS::listFiles` call in the segment commit flow. This is done in `PinotLLCRealtimeSegmentManager` https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java#L494 For realtime tables with high ingestion throughput, the segment commits are quite frequent (100s per minute). For such tables the number of segments is also high (10s of thousands), making this listFiles call costly. This not only can impact ingestion latency but also put pressure on the underlying FS used by PinotFS. There are two options to eliminate this: 1. If the tmp file path is deterministic, avoid the listFiles call and directly delete the file 2. Make the listFiles call async and run in the background. cc: @Jackie-Jiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
