Hi All.
Answered my own question. Disabled pdftoolkit and re-enabled XPDF. Works
with files that are set to prevent copying and seems to do a better job
with the text extraction.
Kind Regards.
Shaun.
On 2020/04/24 13:39, Shaun donovan wrote:
Hi all.
I am receiving the following error when trying to run filter-media on
certain pdf files:
java.io.IOException
at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:108)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379)
at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
at
org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
at
org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:117)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:252)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:236)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:216)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:471)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:395)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:354)
at
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:734)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:550)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:500)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:468)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:360)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
at
org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
Caused by: java.util.zip.DataFormatException: invalid block type
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
at java.util.zip.Inflater.inflate(Inflater.java:280)
at
org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:134)
at
org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:100)
I have realised that this is due to the file being encrypted to
disallow high resolution printing. If I remove the encryption, it
works perfectly. If I encrypt it with the only setting being
--accessibility=y, if fails.
So, is there a way that I can allow filter-media to run on encrypted
files? By passing it a password for example?
Kind Regards.
Shaun
--
All messages to this mailing list should adhere to the DuraSpace Code
of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google
Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to dspace-tech+unsubscr...@googlegroups.com
<mailto:dspace-tech+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dspace-tech/4ccdb042-a79c-4a38-681f-914c58416ff1%40teqcle.co.za
<https://groups.google.com/d/msgid/dspace-tech/4ccdb042-a79c-4a38-681f-914c58416ff1%40teqcle.co.za?utm_medium=email&utm_source=footer>.
--
All messages to this mailing list should adhere to the DuraSpace Code of
Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dspace-tech/655c463e-c311-7b3c-bb3c-eef502071125%40teqcle.co.za.