clintropolis commented on PR #18176: URL: https://github.com/apache/druid/pull/18176#issuecomment-3025121463
>👍 Might be good to add a link to the original SIEVE paper: https://www.usenix.org/conference/nsdi24/presentation/zhang-yazhuo for those interested yea totally, I have it linked in javadocs for `StorageLocation` and was planning to add it to the PR description once I fill it in when this is closer to ready to review just haven't got to it yet 😅 (still changing quite a few things). >@clintropolis , I haven't gone through the PR yet but how will this affect segment assignment/balancing on the Coordinator? This first PR has no changes needed to the coordinator logic. When a historical is set to this mode, the idea is that you set `druid.server.maxSize` to how much data you want it to be responsible for, but set the sizes in `druid.segmentCache.locations` to be the actual disk sizes, and during querying the cache manager will load and drop segments internally as appropriate to stay within the constraints of the `druid.segmentCache.locations` sizes. I'll elaborate more in the PR description once this branch gets closer to review ready. I do have some ideas for follow-up work of adding a new type of "weak" load rule to allow historicals to use the same load logic it does for all segments when `druid.segmentCache.isVirtualStorageFabric` from this PR is set to true, but also still have regular segment loads. This would allow for finer grained control over how segments are loaded by allowing some segments to be sticky and always present in the disk cache (the cache manager supports this internally), while others would be weak references and load on demand but be eligible to be dropped if new strong or weak loads need the space. This likely does require some adjustments to coordinator balancing to distinguish 'weak' loads from regular loads to ensure the regular loads to exceed actual disk space, but I think the changes would be pretty minor. Was planning to address this too in the PR description (or maybe a linked design proposal issue, not sure, haven't decided yet). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
