clintropolis commented on PR #18982:
URL: https://github.com/apache/druid/pull/18982#issuecomment-3851214290

   >You can see that decompressed segment has a size of 11GB, in the HDFS(by 
the current default zip), it has a size of 2.48GB
   
   interesting, that is a lot bigger of a difference than i expected, though 
perhaps if there are a lot of complex columns that are not using compression 
(off by default added in #16863) then a difference of that size makes sense. 
Basically where I'm thking things are heading is moving away from generic 
compression in favor of all of the contents of the segment file being 
compressed.
   
   >Another question I think you're asking is wheter we did experiments to 
upload the raw files into deep storage?
   
   Ah, this was mostly just me wondering about uncompressed sizing, though i 
expect most of the perf stuff would look better not having to do any extra 
compression/decompression, for your segments at least it seems like some 
additional stuff would need to happen to make that viable.
   
   >As for v10, I do know that we can support partial download, but HDFS as 
deep storage is different from object storage,
   for HDFS, it can't support too many files as object storage, and it can't 
provide high concurrent access as object storage, querying data directly from 
hdfs is much slower than querying data from object storage. For us, hdfs is 
main storage and I don't see there's way for us to migrate from hdfs to object 
storage, we will stay on hdfs for very long time.
   
   V10 segments store everything in a single file, `druid.segment`, so in terms 
of count it should be no different than having a single .zip or whatever that 
there is today with externally compressed v9 segments. Though it is fair that 
partial downloads would potentially increase concurrent access, however with 
smaller fetches so maybe not quite as bad. None of the partial stuff exists yet 
though, and virtual storage mode is optional.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to