Yeah, was considering a similar approach, but hoping there might be a less 
hacky way of handling it.

Initial experiments indicate that the poppler pdfunite utility invoked with a 
single input argument just round trips the specified document, cleaning up the 
issue I'm seeing on the way.

On 21 August 2018 at 18:11, Andreas Oxenstierna 
<[email protected]<mailto:[email protected]>> wrote:
Hi

I have also encountered similar issues with PDFs from other Windows softwares.
The workaround I use is to recreate the PDF in any available software which 
ignores missing EOFs, endstreams etc.
Programmatically, this can be done as described in 
https://codedprojects.wordpress.com/2017/06/09/how-to-fix-pypdf-error-eof-marker-not-found/

Hi all,

I'm currently working on a map viewer application that uses GDAL for processing 
geo-referenced map images.  Up till now I've been successfully using the 
poppler library for PDF support, but am currently trying to shift to the 
podofo/poppler hybrid approach (podofo library with poppler pdftoppm utility) 
to work around poppler's GPL licence restrictions.

I have a collection of sample map PDF documents generated by ESRI ArcMap 10 
(different documents from different releases in the 10.x release family), which 
I could successfully process with GDAL/poppler, but most of which fail to load 
with GDAL/podofo.  The document loading also fails with the stand-alone podofo 
pdftoppm utility, both with a version that I've built from podofo 0.9.6 source 
and with the 0.9.3 version installed onto my ubuntu xenial machine from the APT 
package repository.

The typical error message is as follows:


Error: An error 5 ocurred during uncompressing the pdf file.


PoDoFo encounter an error. Error: 5 ePdfError_UnexpectedEOF
    Error Description: End of file was reached unxexpectedly.
    Callstack:
    #0 Error Source: 
/build/libpodofo-NltoF1/libpodofo-0.9.3/src/base/PdfParser.cpp:226
        Information: Unable to load objects from file.
    #1 Error Source: 
/build/libpodofo-NltoF1/libpodofo-0.9.3/src/base/PdfParser.cpp:334
        Information: Unable to load xref entries.
    #2 Error Source: 
/build/libpodofo-NltoF1/libpodofo-0.9.3/src/base/PdfParser.cpp:738
    #3 Error Source: 
/build/libpodofo-NltoF1/libpodofo-0.9.3/src/base/PdfTokenizer.cpp:339

which seems to indicate an invalid xref table.




I don't think this is a podofo bug as such, as various online pdf validators 
I've tried also flag the documents as problematic, but several other bits of 
pdf software I've tried (notably the poppler library utilities) seem to treat 
it as a non-fatal recoverable error.

Has anyone else come across this and come up with a work-around or fix?

Example problem file to be found at 
https://www.dropbox.com/s/khlzgz8o2gxq89y/6090_harvest.pdf?dl=0


thanks

Richard.





_______________________________________________
gdal-dev mailing list
[email protected]<mailto:[email protected]>
https://lists.osgeo.org/mailman/listinfo/gdal-dev


--
Best regards

Andreas Oxenstierna
T-Kartor Geospatial AB
mobile: +46 733 206831
mailto: [email protected]<mailto:[email protected]>
http://www.t-kartor.com

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to