You can check that your Media Filter config isn't strange, and trying to
feed Word Docs to the PDFFilter?
https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L428
That text "Invalid Word Format", I think is only thrown by the Word Filter:
https://github.com/LongsightGroup/DSpace/blob/bookreader/dspace-api/src/main/java/org/dspace/app/mediafilter/WordFilter.java#L92
You can try running your media filter on just this single item, and check
the output. Also, see if anything additional gets written to dspace.log, as
System.out and log.error might have different things.
________________
Peter Dietz
Longsight
www.longsight.com
[email protected]
p: 740-599-5005 x809
On Wed, Aug 6, 2014 at 2:14 PM, Monika Mevenkamp <[email protected]>
wrote:
> I get the an exception from filter-media, see below.
>
> What puzzles me is that it complains about 'Invalid Word Format'
> although the system lists the format as PDF and the stack shows that
> DSPACE tries to parse it as PDF. I can view the file without issue. When
> I download the file and look at its stated format on the command line I
> get.
>
> > wget
> http://dataspace.princeton.edu/jspui/bitstream/88435/dsp01x920fw884/1/8ers.pdf
>
> > file 8ers.pdf
>
> 8ers.pdf: PDF document, version 1.5
>
>
> I am not sure what to look at next.
>
> Monika
>
> Item Handle: 88435/dsp01x920fw884
> Bundle Name: ORIGINAL
> Bitstream: 3597
> Name: 8ers.pdf
> File Size: 370978
> Checksum: cc6054581d069e06cf72f2749cd7163b (MD5)
> Asset Store: 0
> java.lang.IllegalArgumentException
> java.lang.IllegalArgumentException
> at org.apache.fontbox.cff.CFFParser.readEntry(CFFParser.java:150)
> at
> org.apache.fontbox.cff.CFFParser.readDictData(CFFParser.java:117)
> at org.apache.fontbox.cff.CFFParser.parseFont(CFFParser.java:461)
> at org.apache.fontbox.cff.CFFParser.parse(CFFParser.java:71)
> at
> org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:313)
> at
> org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104)
> at
> org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162)
> at
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
> at
> org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
> at
> org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
> at
> org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
> at
> org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:101)
> at
> org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:715)
> at
> org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:537)
> at
> org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:487)
> at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:455)
> at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersCollection(MediaFilterManager.java:433)
> at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersCommunity(MediaFilterManager.java:417)
> at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:379)
> at
> org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:309)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:183)
> Invalid Word Format
> ERROR filtering, skipping bitstream:
> PDF at
>
>
> --
> Monika Mevenkamp
> phone: 609-258-4161
> 123 693 Alexander Street, Princeton University, Princeton, NJ 08544
>
>
>
> ------------------------------------------------------------------------------
> Infragistics Professional
> Build stunning WinForms apps today!
> Reboot your WinForms applications with our WinForms controls.
> Build a bridge from your legacy apps to the future.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette:
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette