Hi All.

Answered my own question. Disabled pdftoolkit and re-enabled XPDF. Works with files that are set to prevent copying and seems to do a better job with the text extraction.

Kind Regards.

Shaun.

On 2020/04/24 13:39, Shaun donovan wrote:

Hi all.

I am receiving the following error when trying to run filter-media on certain pdf files:

java.io.IOException
        at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:108)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
        at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
        at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)         at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)         at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:252)         at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:236)         at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:216)         at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:471)         at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:395)         at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:354)         at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101)         at org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:734)         at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:550)         at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:500)         at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:468)         at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:360)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)         at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
Caused by: java.util.zip.DataFormatException: invalid block type
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        at java.util.zip.Inflater.inflate(Inflater.java:280)
        at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:134)         at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:100)

I have realised that this is due to the file being encrypted to disallow high resolution printing. If I remove the encryption, it works perfectly. If I encrypt it with the only setting being --accessibility=y, if fails.

So, is there a way that I can allow filter-media to run on encrypted files? By passing it a password for example?

Kind Regards.

Shaun

--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com <mailto:dspace-tech+unsubscr...@googlegroups.com>. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/4ccdb042-a79c-4a38-681f-914c58416ff1%40teqcle.co.za <https://groups.google.com/d/msgid/dspace-tech/4ccdb042-a79c-4a38-681f-914c58416ff1%40teqcle.co.za?utm_medium=email&utm_source=footer>.

--
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/655c463e-c311-7b3c-bb3c-eef502071125%40teqcle.co.za.

Reply via email to