FrankChen021 commented on PR #18982:
URL: https://github.com/apache/druid/pull/18982#issuecomment-3846035169

   @clintropolis thanks for the comments.
   
   > quite a lot of the segment already uses lz4 by default so curious how 
effective running the whole thing through lz4 again would be, did you do any 
experiments to compare with just not 'externally' compressing at all in deep 
storage?
   
   I think you're ask for 2 questions here.
   the first is, after lz4 compression on columns, will the compression gain? 
here's a table showing the raw segments and the size under zip/lz4 compression 
again
   
   <img width="784" height="251" alt="image" 
src="https://github.com/user-attachments/assets/363cb558-481f-4b2a-8d76-dd0c5e04617d";
 />
   You can see that decompress segment has a size of 11GB, in the HDFS(by the 
current default zip), it has a size of 2.48GB
   
   Another question I think you're asking is wheter we did experiments to 
upload the raw files into deep storage?
   No. 
   1st, above table shows the compression over the dir has gain, 
   2nd, HDFS deep storage currently does not support upload file by file.
   
   As for v10, I do know that we can support partial download, but HDFS as deep 
storage is different from object storage, 
   for HDFS, it can't support too many files as object storage, and it can't 
provide high concurrent access as object storage, querying data directly from 
hdfs is much slower than querying data from object storage. For us, hdfs is 
main storage and I don't see there's way for us to migrate from hdfs to object 
storage, we will stay on hdfs for very long time.
   
   
   > since it needs to be handled by each implementation of segment 
pusher/puller
   No need to worry about it. the compression configuration is only provided on 
hdfs, and it's implemented in the hdfs, not on a higher level which requires 
all deep storage to do so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to