Re: [dspace-community] Re: OCR Engine

Mark H. Wood Thu, 17 Aug 2023 06:29:06 -0700

On Wed, Aug 16, 2023 at 12:44:38PM -0700, DSpace Community wrote:
> DSpace does not have an OCR engine.  It is only able to index PDFs (or 
> other electronic files) if they have been previously OCR'ed by a different 
> system.


Or if they contained machine-readable text to begin with.

So:  a PDF that was rendered from a word-processing document (for
example) probably contains text that can be flattened and indexed.  A
PDF which contains images of paper documents will not, unless the
imaging software or some other tool has OCRed the images and added a
text layer to the PDF.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/ZN4gnn_q-XJ3Uppp%40IUPUI.Edu.

signature.asc
Description: PGP signature

Re: [dspace-community] Re: OCR Engine

Reply via email to