It isn't necessarily a bad PDF, although it could be.
Some of the viewers handle bad PDF better than
the filter does.
What I would say is that there is clearly something on page 26
that is causing the Iceni library to fall over stone dead. You can
verify this by using the page-start-id and page-end-id options to
xdmp:pdf-convert to just select that page.
I'd look at that page and see if it looks like it has anything
unusual.
Sometimes the errors are due to image processing, and you
can verify that by adjusting the image extraction:
image-output (set to false, and avoid any image extraction)
illustrations (set to false, and avoid extracting certain vector
drawings as images)
Or it could be related to other features like overlapping text
or synthetic bookmarks. You can try tweaking those
settings and see if it makes a difference.
It is also possible that it is a bug in our config file (although that
seems unlikely here) but you can try the streamlined text
configuration instead (set config option to PagedText.cfg)
But at the end of the day, my best advice is to report a bug
and send us the PDF file so we can report the failure to
Iceni for fixing.
//Mary
On Fri, 06 Dec 2013 09:31:52 -0800, Chris Hamlin
<[email protected]> wrote:
>
> Hello,
>
>
> I'm converting about 1000 pdfs to xhtml for some extraction.
>
> One file throws an error:
>
>
> <error:error xsi:schemaLocation="http://marklogic.com/xdmp/error
> error.xsd"
> xmlns:error="http://marklogic.com/xdmp/error"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> <error:code>ICN-FAILED</error:code>
> <error:name/>
> <error:xquery-version>1.0-ml</error:xquery-version>
> <error:message>Conversion failed due to abnormal process
> termination</error:message>
> <error:format-string>ICN-FAILED:
> xdmp:pdf-convert(document{binary{"255044462d312e360d25e2e3cfd30d0a3135342030206f626a0d3c3c2f4d61726b496e666f3c3c2f4d61726b656420747275653e3e2f4d657461646174612031..."}},
> "RR:D04-1015", <options xmlns:tidy="xdmp:tidy"
> xmlns="xdmp:pdf-convert"><config>PDFtoXHTML_exact.cfg</config><image-output>false</image-...</options>)
> -- Conversion failed due to abnormal process termination: -1. Loading
> configuration... Parsing macros... Macro synth-bookmarks='true' Macro
> image-output='true' Macro text-output='true' Macro zones='false' Macro
> ignore-text='true' Macro remove-overprint='false' Macro
> illustrations='true' Macro
> line-breaks='true' Macro image-quality='75' Macro page-start='' Macro
> page-end=''
> Macro document-start='' Macro document-end='' Macro image-output='false'
> Macro
> illustrations='false' features='160004' Processing... Analysing
> '/var/opt/MarkLogic/Temp/db397b7505bb4bf0/conv.pdf' Pages 1 to 30
> Processing page 1
> Processing page 2 Processing page 3 Processing page 4 Processing page 5
> Processing
> page 6 Processing page 7 Processing page 8 Processing page 9 Processing
> page 10
> Processing page 11 Processing page 12 Processing page 13 Processing page
> 14
> Processing page 15 Processing page 16 Processing page 17 Processing page
> 18
> Processing page 19 Processing page 20 Processing page 21 Processing page
> 22
> Processing page 23 Processing page 24 Processing page 25 Processing page
> 26</error:format-string>
> <error:retryable>false</error:retryable>
> <error:expr>xdmp:pdf-convert(document{binary{"255044462d312e360d25e2e3cfd30d0a3135342030206f626a0d3c3c2f4d61726b496e666f3c3c2f4d61726b656420747275653e3e2f4d657461646174612031..."}},
> "RR:D04-1015", <options xmlns:tidy="xdmp:tidy"
> xmlns="xdmp:pdf-convert"><config>PDFtoXHTML_exact.cfg</config><image-output>false</image-...</options>)</error:expr>
>
>
>
> I can open it in Adobe Reader and Preview, and scroll through all the
> pages.
>
>
> Is there some way to check if the PDF is bad, or if this is a conversion
> bug?
>
>
>
> Thanks,
>
>
> Chris Hamlin
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general