[
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212801#comment-13212801
]
John Kodis commented on COMPRESS-16:
------------------------------------
I believe that only the most significant bit of the first byte is used to
signal a binary length field. This is what GNU tar does, and seems to match
both the Wikipedia description of the tar file format
(http://en.wikipedia.org/wiki/Tar_%28file_format%29) and the star man page
referenced previously.
Wikipedia claims:
"To overcome this limitation, star in 2001 introduced a base-256 coding that is
indicated by setting the high-order bit of the leftmost byte of a numeric
field. GNU-tar and BSD-tar followed this idea."
While the star man page says that:
"Star implements a vendor specific (and thus non-POSIX)
extension to put bigger numbers into the numeric fields.
This is done by using a base 256 coding. The top bit of the
first character in the appropriate 8 character or 12 charac-
ter field is set to flag non octal coding. If base 256 cod-
ing is in use, then all remaining characters are used to
code the number. This results in 7 base 256 digits in 8
character fields and in 11 base 256 digits in 12 character
fields. All base 256 numbers are two's complement numbers.
A base 256 number in a 8 character field may hold 56 bits, a
base 256 number in a 12 character field may hold 88 bits.
This may extended to 64 bits for 8 character fields and to
95 bits for 12 character fields. For a negative number the
first character currently is set to a value of 255 (all 8
bits are set).
Since we don't have to worry about negative values when dealing with file
sizes, I believe that the current patch is correct as it stands, at least up to
the size allowed by Java's 64 bit signed long integers.
> unable to extract a TAR file that contains an entry which is 10 GB in size
> --------------------------------------------------------------------------
>
> Key: COMPRESS-16
> URL: https://issues.apache.org/jira/browse/COMPRESS-16
> Project: Commons Compress
> Issue Type: Bug
> Components: Archivers
> Environment: I am using win xp sp3, but this should be platform
> independent.
> Reporter: Sam Smith
> Fix For: 1.4
>
> Attachments:
> 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch,
> 0002-Allow-creating-tar-archives-with-files-over-8GB.patch,
> 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch,
> patch-for-compress.txt
>
>
> I made a TAR file which contains a file entry where the file is 10 GB in size.
> When I attempt to extract the file using TarInputStream, it fails with the
> following stack trace:
> java.io.IOException: unexpected EOF with 24064 bytes unread
> at
> org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
> at
> org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
> So, TarInputStream does not seem to support large (> 8 GB?) files.
> Here is something else to note: I created that TAR file using TarOutputStream
> , which did not complain when asked to write a 10 GB file into the TAR file,
> so I assume that TarOutputStream has no file size limits? That, or does it
> silently create corrupted TAR files (which would be the worst situation of
> all...)?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira