[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212801#comment-13212801
 ] 

John Kodis commented on COMPRESS-16:
------------------------------------

I believe that only the most significant bit of the first byte is used to 
signal a binary length field.  This is what GNU tar does, and seems to match 
both the Wikipedia description of the tar file format 
(http://en.wikipedia.org/wiki/Tar_%28file_format%29) and the star man page 
referenced previously.  

Wikipedia claims:

"To overcome this limitation, star in 2001 introduced a base-256 coding that is 
indicated by setting the high-order bit of the leftmost byte of a numeric 
field. GNU-tar and BSD-tar followed this idea."

While the star man page says that:

    "Star implements  a  vendor  specific  (and  thus  non-POSIX)
     extension  to  put  bigger  numbers into the numeric fields.
     This is done by using a base 256 coding.  The top bit of the
     first character in the appropriate 8 character or 12 charac-
     ter field is set to flag non octal coding.  If base 256 cod-
     ing  is  in  use,  then all remaining characters are used to
     code the number. This results in 7  base  256  digits  in  8
     character  fields  and in 11 base 256 digits in 12 character
     fields.  All base 256 numbers are two's complement  numbers.
     A base 256 number in a 8 character field may hold 56 bits, a
     base 256 number in a 12 character field may  hold  88  bits.
     This  may  extended to 64 bits for 8 character fields and to
     95 bits for 12 character fields. For a negative  number  the
     first  character  currently  is set to a value of 255 (all 8
     bits are set).

Since we don't have to worry about negative values when dealing with file 
sizes, I believe that the current patch is correct as it stands, at least up to 
the size allowed by Java's 64 bit signed long integers.
                
> unable to extract a TAR file that contains an entry which is 10 GB in size
> --------------------------------------------------------------------------
>
>                 Key: COMPRESS-16
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-16
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>         Environment: I am using win xp sp3, but this should be platform 
> independent.
>            Reporter: Sam Smith
>             Fix For: 1.4
>
>         Attachments: 
> 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
> 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
> 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch, 
> patch-for-compress.txt
>
>
> I made a TAR file which contains a file entry where the file is 10 GB in size.
> When I attempt to extract the file using TarInputStream, it fails with the 
> following stack trace:
>       java.io.IOException: unexpected EOF with 24064 bytes unread
>               at 
> org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
>               at 
> org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
> So, TarInputStream does not seem to support large (> 8 GB?) files.
> Here is something else to note: I created that TAR file using TarOutputStream 
> , which did not complain when asked to write a 10 GB file into the TAR file, 
> so I assume that TarOutputStream has no file size limits?  That, or does it 
> silently create corrupted TAR files (which would be the worst situation of 
> all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to