gianm commented on issue #6905:
URL: https://github.com/apache/druid/issues/6905#issuecomment-741045921


   I learned a lot about the ZIP format today 🙂
   
   I did a little testing and I suspect this is due to the fact that 
ZipInputStream, which is used through CompressionUtils.unzip when pulling 
segment files, does not read the central directory at the end of zip files. It 
only reads the local file headers and local file data. (See 
https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html.) It closes the 
file when it first encounters the central directory, rather than reading the 
whole thing.
   
   I think this is OK, because Druid itself is writing these zip files, and it 
isn't using any of the zip features (file replacement, deletion) that would 
require usage of the central directory. That means these messages are harmless, 
but they are annoying. We should be able to get rid of them by modifying 
CompressionUtils.unzip to read the rest of the stream after unzipping is done.
   
   We should also implement a non-streaming version of CompressionUtils.unzip 
that is used in situations where Druid itself didn't write the zip file. In 
these situations, it's important to read the central directory.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to