Quoth Shaun Burriss on Mon, Sep 06, 2010 at 02:52:29AM -0400: > Does the Fedora web administrator provide support for ingesting a record > with, for example, a PDF document and doing the conversion to a Full Text > (plain text) datastream which can then be indexed?
At Emory University Libraries we ingest PDFs and then use Fedora GSearch for full-text indexing. With appropriate configuration, GSearch can extract text from PDFs. This hasn’t been free of problems for us, but it’s mostly served us well. We use our own front-end to ingest the PDFs, but I believe the Fedora web admin should work just as well for ingesting them and should not cause problems for GSearch. It might be worth noting that we reposit the PDF but not the generated full text. GSearch generates this on the fly for indexing but doesn’t stash it back in Fedora. If you need to reposit the generated full text then the solution we use probably won’t be sufficient for you. -- Ben Ranker <[email protected]> Software Engineer, Sr. Emory University Libraries
signature.asc
Description: Digital signature
------------------------------------------------------------------------------ This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
