Hi 

 

We're running Dspace 1.8.2, and are considering implementing full text
indexing for our pdf content. I see the discussion on configuring media
filters at: 

 

 
https://wiki.duraspace.org/display/DSDOC18/Configuration#Configuration-C
onfiguringMediaFilters

 

 

Can someone tell me if I need to prep these documents first by running
them through some kind of OCR software? The documentation tells me "the
PDF Media Filter will extract textual content from PDF bitstream" which
makes me think the OCR step isn't necessary . . . or maybe I'm dreaming?

 

 

Thanks, 

 

 

 

Dan 

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to