Thanks guys,

(I'm on v1.4.1 here for our main repository).

Your right - I only tried index-all from the command line earlier when I
was trying to figure out why this wasn't working - apologies, an example
of brain freeze!! I had a quiet "D'oh" moment when someone mentioned
filter-media :-)

I tried filter-media from the command line and it did indeed bomb out
fairly early on due to a protected PDF/bouncy castle type error which is
presumably why the cron filter-media wasn't doing its' job. 

I dropped the bouncy castle PDF jars into the lib directory (copied over
from a v1.4.2 repo I'm also running), re-ran filter-media and that seems
to have done the trick - my PDF has now been filtered and indexed and
can be search from within DSpace :-).

Interestingly I did still get a couple of errors, but these didn't stop
the filter-media process as was the case previously (I don't know if
this is because of the new jars or if these are less serious errors than
the one that previously caused filter-media to bomb out) - just for
reference, these are the errors I'm seeing:

ERROR filtering, skipping bitstream #364
java.util.NoSuchElementException
java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(AbstractList.java:426)
        at
org.textmining.text.extraction.WordExtractor.extractText(WordExtractor.j
ava:150)
        at
org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.ja
va:97)
        at
org.dspace.app.mediafilter.MediaFilter.processBitstream(MediaFilter.java
:155)
        at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilte
rManager.java:327)
        at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterMana
ger.java:296)
        at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilt
erManager.java:266)
        at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(Media
FilterManager.java:234)
        at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.ja
va:185)


ERROR filtering, skipping bitstream #169 java.io.IOException: Error
decrypting document, details: Error: The supplied password does not
match either the owner or user password in the document.
java.io.IOException: Error decrypting document, details: Error: The
supplied password does not match either the owner or user password in
the document.
        at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:208)
        at
org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
        at
org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java
:110)
        at
org.dspace.app.mediafilter.MediaFilter.processBitstream(MediaFilter.java
:155)
        at
org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilte
rManager.java:327)
        at
org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterMana
ger.java:296)
        at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilt
erManager.java:266)
        at
org.dspace.app.mediafilter.MediaFilterManager.applyFiltersAllItems(Media
FilterManager.java:234)
        at
org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.ja
va:185)


Thanks again for all the useful advice and pointers, and for helping me
to sort this out (and getting me past my brain freeze!).

Cheers,

Mike

Michael White 
eLearning Developer
Centre for eLearning Development (CeLD) 
S7, The Library 
University of Stirling 
Stirling SCOTLAND 
FK9 4LA 

Email: [EMAIL PROTECTED] 
Tel: +44 (0) 1786 466877 
Fax: +44 (0) 1786 466880 

http://www.is.stir.ac.uk/celd/


-- 
The University of Stirling (a charity registered in Scotland, number
SCO11159) is a university established in Scotland by charter at Stirling,
FK9 4LA.  Privileged/Confidential Information may be contained in this
message.  If you are not the addressee indicated in this message (or
responsible for delivery of the message to such person), you may not
disclose, copy or deliver this message to anyone and any action taken or
omitted to be taken in reliance on it, is prohibited and may be unlawful.
In such case, you should destroy this message and kindly notify the sender
by reply email.  Please advise immediately if you or your employer do not
consent to Internet email for messages of this kind.



-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to