I thin the link your missing here is that you need to be running  
FilterMedia to generate the text extracts that end up in the full text  
indexes of Lucene.  Look into setting that up in a cron job.

http://www.dspace.org/1_5_1Documentation/ch09.html#N1386F

-Mark

On Dec 14, 2008, at 6:05 AM, Stuart Lewis wrote:

>
> On 13/12/2008 00:23, "Andrew Marlow" <marlow.and...@googlemail.com>  
> wrote:
>
>> I saw your note at http://www.loc.gov/standards/mets/mets-registry.html 
>>  about
>> adding METS support to DSpace. I am new to DSpace and to digital  
>> libraries
>> generally, so I could be wrong, but AFAIK DSpace does not support  
>> METS yet.
>
> When importing or exporting to METS, DSpace 'crosswalks' the  
> metadata in and
> out of METS.
>
> See: http://wiki.dspace.org/index.php/DSpaceMETSSIPProfile
>
>> The reason I ask is that I am prototyping a digital library using  
>> DSpace as a
>> starting point and I have a number of non-searchable PDFs that I  
>> want to
>> import. Because they are non-searchable (I wish they were  
>> searchable, they
>> just happen not to be) I want to get METS metadata for them. This  
>> would enable
>> full text search and, hopefully, rendering as HTML as well.
>
> I'm not sure how using METS could make PDF files made up of images
> searchable. Could you explain? METS files are used for encoding and
> transferring metadata.
>
>> Also, there might be some indexing operation of DSpace that I am  
>> not aware of.
>> I tried uploading a HTML version of a PDF where I used GMail to  
>> generate the
>> HTML. Once in DSpace it seemed like it was not searchable either,  
>> just like
>> the PDF. I checked the HTML code and the words were there. So maybe  
>> I have to
>> tell DSpace to index new material manually? I thought it indexed it  
>> as it was
>> uploaded.
>
> Have you run [dspace]/bin/index-init, [dspace]/bin/filter-media and
> [dspace]/bin/index-update. Running these will extract text where  
> possible
> (from HTML, PDF, and .DOC files) and index them.
>
> Thanks,
>
>
> Stuart
> _________________________________________________________________
>
> Gwasanaethau Gwybodaeth                      Information Services
> Prifysgol Aberystwyth                      Aberystwyth University
>
>            E-bost / E-mail: stuart.le...@aber.ac.uk
>                 Ffon / Tel: (01970) 622860
> _________________________________________________________________
>
>
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas,  
> Nevada.
> The future of the web can't happen without you.  Join us at MIX09 to  
> help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech

~~~~~~~~~~~~~
Mark R. Diggory
http://purl.org/net/mdiggory/homepage




------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to