FrankChen021 commented on PR #18982: URL: https://github.com/apache/druid/pull/18982#issuecomment-3846035169
@clintropolis thanks for the comments. > quite a lot of the segment already uses lz4 by default so curious how effective running the whole thing through lz4 again would be, did you do any experiments to compare with just not 'externally' compressing at all in deep storage? I think you're ask for 2 questions here. the first is, after lz4 compression on columns, will the compression gain? here's a table showing the raw segments and the size under zip/lz4 compression again <img width="784" height="251" alt="image" src="https://github.com/user-attachments/assets/363cb558-481f-4b2a-8d76-dd0c5e04617d" /> You can see that decompress segment has a size of 11GB, in the HDFS(by the current default zip), it has a size of 2.48GB Another question I think you're asking is wheter we did experiments to upload the raw files into deep storage? No. 1st, above table shows the compression over the dir has gain, 2nd, HDFS deep storage currently does not support upload file by file. As for v10, I do know that we can support partial download, but HDFS as deep storage is different from object storage, for HDFS, it can't support too many files as object storage, and it can't provide high concurrent access as object storage, querying data directly from hdfs is much slower than querying data from object storage. For us, hdfs is main storage and I don't see there's way for us to migrate from hdfs to object storage, we will stay on hdfs for very long time. > since it needs to be handled by each implementation of segment pusher/puller No need to worry about it. the compression configuration is only provided on hdfs, and it's implemented in the hdfs, not on a higher level which requires all deep storage to do so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
