Quoth Shaun Burriss on Mon, Sep 06, 2010 at 02:52:29AM -0400:
> Does the Fedora web administrator provide support for ingesting a record
> with, for example, a PDF document and doing the conversion to a Full Text
> (plain text) datastream which can then be indexed?

At Emory University Libraries we ingest PDFs and then use Fedora GSearch for
full-text indexing. With appropriate configuration, GSearch can extract text
from PDFs. This hasn’t been free of problems for us, but it’s mostly served
us well.

We use our own front-end to ingest the PDFs, but I believe the Fedora web
admin should work just as well for ingesting them and should not cause
problems for GSearch.

It might be worth noting that we reposit the PDF but not the generated full
text. GSearch generates this on the fly for indexing but doesn’t stash it
back in Fedora. If you need to reposit the generated full text then the
solution we use probably won’t be sufficient for you.

-- 
Ben Ranker <[email protected]>
Software Engineer, Sr.
Emory University Libraries

Attachment: signature.asc
Description: Digital signature

------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to