capistrant commented on issue #6988: [Improvement] historical fast restart by lazy load columns metadata(20X faster) URL: https://github.com/apache/incubator-druid/pull/6988#issuecomment-533593456 @clintropolis @gianm @pzhdfy We love this patch. It makes our lives much easier when rolling out changes and running through upgrades. > @pzhdfy I did similar lazy load thing in my code base, but I found there is a consequence. If historical node is force killed(kill -9) when it is asked to download new segment, it is very likely unzip failure and left a corrupted segment folder. When lazy load=false, this segment can be ignored during historical startup(loading will fail), but when lazy load=true, this corruption is only known when a query comes in. Currently there is no interface to tell SegmentLoaderLocalCacheManager to unload this segment or reload this segment, thus queries always fail. Currently the only solution is to delete this segment folder and restart historical node to ignore this segment(because folder is gone). > > Better introduce unload/reload interface together in this PR. I am nervous about this though. As of now we have accepted the risk and are willing to intervene when needed. Do you have any plans of addressing this comment in this PR or would this be an additional PR. I do think that it would get more support if we did analysis on this and solved it if it is indeed a problem. > Interesting idea, I too have felt the pain of multi-day rollouts for tiers of densely loaded historical servers. > > I've only scanned changes so far, I'll try to do a full review later. In the mean time, if possible could you run query benchmarks before this patch and after this patch with `lazyLoadOnStart = false`, as a sanity check to make sure that there are not any odd performance effects of introducing the memoized suppliers? I don't really _expect_ any noticeable effect, but it would make buy in to this change easier. (No worries if you don't have a chance to get to it, i'll try to do this myself once I get around to full review). > > I would also be interested in the performance cost for queries that _do_ have to eat the deserialization when `lazyLoadOnStart = true`, and whether it opens up scenarios where the memoizing supplier becomes a sort of choke point for the processing pool if the cost is high. Regarding Clint's comment on performance. We have been running it for nearly a week on a cluster with hundreds of thousands of segments and hundreds of thousands of queries a day and our metrics collection shows negligible change in performance (but this is vs druid 11. If druid 15 had large performance gains across the board before this patch there could be a bigger change). We added this as a part of our upgrade to druid 15 though so we have not seen performance with it off in druid 15, just in druid 11. Regardless, we are eager to get this reviewed and accepted upstream. Our experience with it so far has been great and we'd love to help in whatever way possible to get it merged.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org