mcvsubbu commented on pull request #6479: URL: https://github.com/apache/incubator-pinot/pull/6479#issuecomment-771020682
> Thanks for the details @jackjlli, could you also is IntermediateSegment better than existing MutableSegment? For example, you could stream input data to MutableSegment and flush it as needed. This also solves multiple problems: > > * Common code base for offline and RT segment generation (at least for the streaming part). > * Sorting can now be done for offline within SegmentGeneration, instead of having users to explicitly do so. > * Auto segment sizing that happens in RT will can also be done with offline now. > > Thoughts @jackjlli @Jackie-Jiang? I think this is a good idea to explore, but I suspect memory utilization on the offline side may go up significantly. Also, the auto-segment sizing in realtime is implemented (in the controller) by learning the history of segments already completed. For offline generation, if we can keep a history or some learning mechanism, then it may be possible to implement approximate segment sizing algorithms -- whether we use MutableSegment to build segments or not. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
