[ https://issues.apache.org/jira/browse/COMPRESS-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Bodewig resolved COMPRESS-376. ------------------------------------- Resolution: Not A Problem Thanks > decompressConcatenated improvement > ---------------------------------- > > Key: COMPRESS-376 > URL: https://issues.apache.org/jira/browse/COMPRESS-376 > Project: Commons Compress > Issue Type: Improvement > Components: Compressors > Reporter: Jeremy Gustie > Priority: Major > > First the problem I am seeing: in general I am always setting > {{decompressConcatenated}} to {{true}}, most of the time this works fine. > However, it seems like some versions of Python tarfile will pad a compressed > TAR file with null bytes. The null bytes are recognized as garbage, causing > decompression to fail. Unfortunately this failure occurs while filling a > buffer for data used to read the final entry in the TAR file causing > {{TarArchiveInputStream.getNextEntry}} to fail before the last entry can be > returned. > There are a couple of potential solutions I can see: > 1. The easiest thing to do we be to special case the null padding and just > terminate without failing (in the {{GzipCompressorInputStream.init}} method, > this amounts to adding a check for {{magic0 == 0 && (magic1 == 0 || magic1 == > -1)}} and returning {{false}}). Perhaps draining the underlying stream to > ensure that the remaining bytes are all null could reduce the likelihood of a > false positive recognizing the padding. > 2. Change {{decompressConcatenated}} to a tri-state value (maybe add an extra > {{ignoreGarbage}} flag) to suppress the failure; basically concatenated > streams would be decompressed only if the appropriate magic is found. This > has API impact but completely preserves backwards compatibility. > 3. Finally, deferring the failure to the next read attempt may also be a > viable solution that nearly preserves backwards compatibility. As I mentioned > before, the "Garbage after..." error occurs while reading the final entry in > a TAR file: if the current read (which contains all of the final data from > the compression stream) were allowed to complete normally, the downstream > consumer might also complete normally; the next attempt to read (the garbage > past the end of the compression stream) would be the read that fails with the > "Garbage after..." error. This gives the downstream code the best opportunity > to both process the full compression stream and receive the unexpected > garbage failure. > I was mostly looking at the {{GzipCompressorInputStream}}, I suspect similar > changes would be needed in the other decompress-concatenated compressor > streams. -- This message was sent by Atlassian JIRA (v7.6.14#76016)