[ 
https://issues.apache.org/jira/browse/PDFBOX-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109001#comment-15109001
 ] 

Andreas Lehmkühler commented on PDFBOX-3201:
--------------------------------------------

I've copied [~Sumit Saha]s comment from PDFBOX-2976:
{quote}
Hi Everyone,
I have been following PDFBox releated discussion for long and i have a better 
fix in my mind in which there would be no data loss.

The exception "incorrect data check" arises due to adler-32 checksum 
computation failure or corrupted stream.
Adler-32 Checksum can fail due to change in byteorder of last 4 bytes in the 
stream from Big Endian to Little Endian.
So what can be done is to bypass adler-32 check whcih would allow to extract 
all the data in the stream using inflaterinputstream.

To do so from code point of view , before making InflaterInputStream object, do 
this

Option 1:- Either change the byteorder for last 4 bytes in the raw stream i.e 
from Little Endian to Big Endian before feeding the stream to 
InflaterInputstream

Option 2:- If Option 1 fails then do as mentioned below

inStm.skip(2); // here inStm is the object corresponding to the raw stream
Inflater inf = new Inflater(true); // the argument true corresponds to the 
option NoWrap turning it true would disable checksum computation.

Then create the object for InflaterInputStream ifis = new 
InflaterInputStream(inStm, inf);

Moreover the skipping first two bytes is required and those two bytes 
corresponds to Zlib header which are not required when adler-32 check needs to 
be bypassed.

Using this logic, even small data loss could be avoided.

For more info:- I had raised a similar question on stackoverflow, please go 
through it
http://stackoverflow.com/questions/33348192/attached-code-throws-java-util-zip-zipexception-incorrect-data-check-for-given
{quote}


> Skip zlib-header and checksum to avoid DataFormatException
> ----------------------------------------------------------
>
>                 Key: PDFBOX-3201
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3201
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Andreas Lehmkühler
>            Assignee: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>
> This is a follow up to PDFBOX-2976



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to