[jira] [Commented] (JENA-744) Error importing from large gzip

Rob Vesse (JIRA) Fri, 11 Jul 2014 04:36:31 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058668#comment-14058668
 ]


Rob Vesse commented on JENA-744:
--------------------------------

Actually having gone and read the spec I take back that point about the size 
limit.  The spec (http://tools.ietf.org/html/rfc1952) says the following:

{quote}
ISIZE (Input SIZE)
            This contains the size of the original (uncompressed) input
            data modulo 2^32.
{quote}

So the uncompressed input may be greater than 4GB in size and the size stored 
is the size modulo 2^32 and notably this is present in the trailer of the file 
not the header so decompression should not even see this value until the end of 
decompression.

Therefore any truncation that happens must be as a result of the inaccurate 
length in the actual compressed data blocks not due to the Gzip container itself

> Error importing from large gzip
> -------------------------------
>
>                 Key: JENA-744
>                 URL: https://issues.apache.org/jira/browse/JENA-744
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>            Reporter: Michael Kozakov
>         Attachments: gzip.png
>
>
> gzip has a documented bug: 
> http://www.freebsd.org/cgi/man.cgi?query=gzip#end
> "According to RFC 1952, the   recorded file size is stored in a 32-bit inte-
>      ger, therefore, it       can not represent files larger than 4GB.  This 
> limita-
>      tion also applies to -l option of gzip utility."
> As a result, a 28gb compressed gz shows that the uncompressed size is 1.6gb. 
> (screenshot attached)
> It seems like tdbloader relies on this information to know when to stop 
> importing, and as a result, the imported database is incomplete. As a 
> walkaround, I have to extract the archive before using tdbloader to import 
> the database, otherwise it will be missing the majority of items.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (JENA-744) Error importing from large gzip

Reply via email to