mcvsubbu commented on issue #7192: URL: https://github.com/apache/pinot/issues/7192#issuecomment-885901824
> IIUC the problem here is during segment commit, there's no consumption. Instead of having a time scheduled for segment commit, we can find a way to start consuming for the new segment while the current segment is being committed. +1 to this, if we can write up a design doc to do it, and take it up. Here is one idea that I have toyed with (only in thought-ware): When the current consuming segment (say, segment A) starts the completion phase, start a new consuming segment,( say A+) that is a `MutableSegment` as well. Whenever queries come in for segment A, include A+ as well in the server, so that the latest rows are included. Happy path: Segment A goes to ONLINE state at some point, and continues happily ever after. Segment A+ is renamed to segment B (the next one in line) when the server gets an OFFLINE to CONSUMING state transition. Unahappy paths are essentially unanswered. - We now need a lot more memory on the system. May not be double (if mmaped), but some extra. - What happens if segment A takes a long time to complete, and it is a high ingestion use case? Can we stop consumption in A+ after reaching (say) 10% of expected number of rows? Thereafter you get stale data until something happens to A. - Other unhappy questions that I may not even know about. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
