As suggested by dspace, filter-media is set to run nightly. As our instance
has grown, so have the number of files that filter media is not able to index.
We have 60,738 items in our repository, and as of today, filter-media is not
able to index 892. I'm trying to determine if there is anything that can be
done so that as many of these 892 items are able to be indexed. I have copied
portion of the output of filter-media below. Could someone that better
understands filter-media let me know if there is something that can be done.
Many thanks! Jose
Applying Media Filters
ERROR filtering, skipping bitstream:
Item Handle: 2027.42/62012
Bundle Name: ORIGINAL
File Size: 58
Checksum: a500810f390e82e2aead21d5220e7325 (MD5)
Asset Store: 1
java.lang.IllegalArgumentException: Width (80) and height (0) cannot be <= 0
java.lang.IllegalArgumentException: Width (80) and height (0) cannot be <= 0
at
java.awt.image.DirectColorModel.createCompatibleWritableRaster(DirectColorModel.java:999)
at java.awt.image.BufferedImage.<init>(BufferedImage.java:312)
at
org.dspace.app.mediafilter.JPEGFilter.getDestinationStream(JPEGFilter.java:161)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:674)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:575)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:525)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:493)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:432)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:363)
ERROR filtering, skipping bitstream:
Item Handle: 2027.42/69214
Bundle Name: ORIGINAL
File Size: 268039
Checksum: 4e64d97f5a151819da52b095b1fef5d3 (MD5)
Asset Store: 1
java.lang.NullPointerException
java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
at
org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:674)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:575)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:525)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:493)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:432)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:363)
ERROR filtering, skipping bitstream:
Item Handle: 2027.42/55391
Bundle Name: ORIGINAL
File Size: 473660
Checksum: 3686c4d66884a89d81ddfe420a1b661b (MD5)
Asset Store: 1
java.io.IOException: Unknown encoding for 'Identity-V'
java.io.IOException: Unknown encoding for 'Identity-V'
at
org.pdfbox.encoding.EncodingManager.getEncoding(EncodingManager.java:82)
at org.pdfbox.pdmodel.font.PDFont.getEncoding(PDFont.java:612)
at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:466)
at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
at org.pdfbox.util.operator.ShowText.process(ShowText.java:64)
at
org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452)
at
org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215)
at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:674)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:575)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:525)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:493)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:432)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:363)
ERROR filtering, skipping bitstream:
Item Handle: 2027.42/61991
Bundle Name: ORIGINAL
File Size: 58
Checksum: a500810f390e82e2aead21d5220e7325 (MD5)
Asset Store: 1
java.lang.IllegalArgumentException: Width (80) and height (0) cannot be <= 0
java.lang.IllegalArgumentException: Width (80) and height (0) cannot be <= 0
at
java.awt.image.DirectColorModel.createCompatibleWritableRaster(DirectColorModel.java:999)
at java.awt.image.BufferedImage.<init>(BufferedImage.java:312)
at
org.dspace.app.mediafilter.JPEGFilter.getDestinationStream(JPEGFilter.java:161)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:674)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:575)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:525)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:493)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:432)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:363)
ERROR filtering, skipping bitstream:
Item Handle: 2027.42/50480
Bundle Name: ORIGINAL
File Size: 177152
Checksum: af66e3bb52ebe7f1b4c9cc06fa9a6257 (MD5)
Asset Store: 1
java.util.NoSuchElementException
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.java:150)
at
org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:95)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:674)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:575)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:525)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:493)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:432)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:363)
ERROR filtering, skipping bitstream:
Item Handle: 2027.42/40280
Bundle Name: ORIGINAL
File Size: 90205
Checksum: 2437377db51a8e3b9347c784b61906f9 (MD5)
Asset Store: 1
java.io.IOException: expected='endobj' firstReadAttempt='endobj154'
secondReadAttempt='0' org.pdfbox.io.pushbackinputstr...@122c9df
java.io.IOException: expected='endobj' firstReadAttempt='endobj154'
secondReadAttempt='0' org.pdfbox.io.pushbackinputstr...@122c9df
at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:502)
at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:176)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:707)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:691)
at
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:138)
at
org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:674)
at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:575)
at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:525)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:493)
at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:432)
at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:363)
ERROR filtering, skipping bitstream:
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech