gianm commented on issue #6905: URL: https://github.com/apache/druid/issues/6905#issuecomment-741045921
I learned a lot about the ZIP format today 🙂 I did a little testing and I suspect this is due to the fact that ZipInputStream, which is used through CompressionUtils.unzip when pulling segment files, does not read the central directory at the end of zip files. It only reads the local file headers and local file data. (See https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html.) It closes the file when it first encounters the central directory, rather than reading the whole thing. I think this is OK, because Druid itself is writing these zip files, and it isn't using any of the zip features (file replacement, deletion) that would require usage of the central directory. That means these messages are harmless, but they are annoying. We should be able to get rid of them by modifying CompressionUtils.unzip to read the rest of the stream after unzipping is done. We should also implement a non-streaming version of CompressionUtils.unzip that is used in situations where Druid itself didn't write the zip file. In these situations, it's important to read the central directory. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
