[
https://issues.apache.org/jira/browse/JENA-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058578#comment-14058578
]
Rob Vesse commented on JENA-744:
--------------------------------
I'm not sure what you expect us to do here?
As you point out the Gzip spec limits the file size to 4GB so the size of files
greater than that cannot be detected a priori. We are just using the standard
Java {{GZipInputStream}} to read GZipped files which must logically be the
source of the truncation so files exceeding that limit are always going to be
problematic.
I am guessing you want us to switch to an alternative Gzip implementation that
does not have this problem? If you know of an alternative Java implementation
of Gzip decompression that is compatible with Apache licensing policy and does
not experience this bug that we could use then we can happily look at switching
to that
> Error importing from large gzip
> -------------------------------
>
> Key: JENA-744
> URL: https://issues.apache.org/jira/browse/JENA-744
> Project: Apache Jena
> Issue Type: Bug
> Components: TDB
> Reporter: Michael Kozakov
> Attachments: gzip.png
>
>
> gzip has a documented bug:
> http://www.freebsd.org/cgi/man.cgi?query=gzip#end
> "According to RFC 1952, the recorded file size is stored in a 32-bit inte-
> ger, therefore, it can not represent files larger than 4GB. This
> limita-
> tion also applies to -l option of gzip utility."
> As a result, a 28gb compressed gz shows that the uncompressed size is 1.6gb.
> (screenshot attached)
> It seems like tdbloader relies on this information to know when to stop
> importing, and as a result, the imported database is incomplete. As a
> walkaround, I have to extract the archive before using tdbloader to import
> the database, otherwise it will be missing the majority of items.
--
This message was sent by Atlassian JIRA
(v6.2#6252)