clintropolis commented on PR #18982: URL: https://github.com/apache/druid/pull/18982#issuecomment-3851214290
>You can see that decompressed segment has a size of 11GB, in the HDFS(by the current default zip), it has a size of 2.48GB interesting, that is a lot bigger of a difference than i expected, though perhaps if there are a lot of complex columns that are not using compression (off by default added in #16863) then a difference of that size makes sense. Basically where I'm thking things are heading is moving away from generic compression in favor of all of the contents of the segment file being compressed. >Another question I think you're asking is wheter we did experiments to upload the raw files into deep storage? Ah, this was mostly just me wondering about uncompressed sizing, though i expect most of the perf stuff would look better not having to do any extra compression/decompression, for your segments at least it seems like some additional stuff would need to happen to make that viable. >As for v10, I do know that we can support partial download, but HDFS as deep storage is different from object storage, for HDFS, it can't support too many files as object storage, and it can't provide high concurrent access as object storage, querying data directly from hdfs is much slower than querying data from object storage. For us, hdfs is main storage and I don't see there's way for us to migrate from hdfs to object storage, we will stay on hdfs for very long time. V10 segments store everything in a single file, `druid.segment`, so in terms of count it should be no different than having a single .zip or whatever that there is today with externally compressed v9 segments. Though it is fair that partial downloads would potentially increase concurrent access, however with smaller fetches so maybe not quite as bad. None of the partial stuff exists yet though, and virtual storage mode is optional. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
