Hello,
The built in filter for PDF's (PDFbox) has a few limitations, such that it
can't filter every PDF, but it is easy to ship with DSpace such that a basic
installation is easy as possible.
You're in luck, because with an extra step, you can install XPDF, which is a
plugin for filtering PDF's that is much better than the default, and has
helped us to filter more PDF's.
You can read more about it here, and learn how to install it. Its not
difficult to configure once you've installed xpdf on your system.
1.5 http://www.dspace.org/1_5_2Documentation/ch05.html#N12768
<http://www.dspace.org/1_5_2Documentation/ch05.html#N12768>1.6
http://www.dspace.org/1_6_2Documentation/ch05.html#N14F32
1.7
https://wiki.duraspace.org/display/DSDOC/Configuration#Configuration-XPDFFilter
You could also try to update your pom.xml to fetch a more current version of
PDFbox, but no guarantees. There are some improvements to the pdfbox filter
in DSpace 1.7.
--
Peter Dietz
Systems Developer/Engineer
Ohio State University Libraries
On Tue, Nov 2, 2010 at 12:08 PM, simharaju meher
<[email protected]>wrote:
> Hi There
>
> Iam was trying to run 'filter-media' , I uploaded only one pdf document to
> an item and Iam using dspace 1.5. I tried to run
>
> /dspace/bin$ ./filter-media
>
> I got below exception, can anyone help it out
>
>
> ERROR filtering, skipping bitstream:
>
> Item Handle: 123456789/12183
> Bundle Name: ORIGINAL
> File Size: 798549
> Checksum: f5139ae7c6270e76403cb578ad5ecade (MD5)
> Asset Store: 0
> java.lang.NullPointerException
> java.lang.NullPointerException
> at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
> at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
> at
> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
> at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
> at
> org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139)
> at
> org.dspace.app.mediafilter.MediaFilterManager.processBitstream(MediaFilterManager.java:674)
> at
> org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:575)
> at
> org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:525)
> at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:493)
> at
> org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(MediaFilterManager.java:432)
> at
> org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:363)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:212)
>
>
>
> Rgds
> Meher
>
>
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America
> contest
> Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
> marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
> http://p.sf.net/sfu/nokia-dev2dev
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>
------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment. Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech