[jira] [Commented] (JENA-744) Error importing from large gzip

Rob Vesse (JIRA) Fri, 11 Jul 2014 02:15:28 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058578#comment-14058578
 ]


Rob Vesse commented on JENA-744:
--------------------------------

I'm not sure what you expect us to do here?

As you point out the Gzip spec limits the file size to 4GB so the size of files 
greater than that cannot be detected a priori.  We are just using the standard 
Java {{GZipInputStream}} to read GZipped files which must logically be the 
source of the truncation so files exceeding that limit are always going to be 
problematic.

I am guessing you want us to switch to an alternative Gzip implementation that 
does not have this problem?  If you know of an alternative Java implementation 
of Gzip decompression that is compatible with Apache licensing policy and does 
not experience this bug that we could use then we can happily look at switching 
to that

> Error importing from large gzip
> -------------------------------
>
>                 Key: JENA-744
>                 URL: https://issues.apache.org/jira/browse/JENA-744
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>            Reporter: Michael Kozakov
>         Attachments: gzip.png
>
>
> gzip has a documented bug: 
> http://www.freebsd.org/cgi/man.cgi?query=gzip#end
> "According to RFC 1952, the   recorded file size is stored in a 32-bit inte-
>      ger, therefore, it       can not represent files larger than 4GB.  This 
> limita-
>      tion also applies to -l option of gzip utility."
> As a result, a 28gb compressed gz shows that the uncompressed size is 1.6gb. 
> (screenshot attached)
> It seems like tdbloader relies on this information to know when to stop 
> importing, and as a result, the imported database is incomplete. As a 
> walkaround, I have to extract the archive before using tdbloader to import 
> the database, otherwise it will be missing the majority of items.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (JENA-744) Error importing from large gzip

Reply via email to