Hi, I thought I'd chime in here to say, everyone who has responded is
correct: there is currently no OCR functionality within DSpace. However,
DSpace does utilize Apache Tika to feed the fulltext search index, and Tika
does also support OCR functionality (via Tesseract OCR). To be clear,
there's no OCR capability within DSpace... yet... but someone could build
it, if they were keen to do so.

One word of caution to developers who want to tackle this job: I've seen
Tesseract OCR severely impact another software's throughput... You'd have
to engineer carefully to avoid running into the same problem.

--Hardy

On Wed, Aug 16, 2023 at 5:57 PM Yvonne <[email protected]> wrote:

> Thank you both. I found this helpful to know!
>
> Best regards,
> Yvonne
>
> On Wed, Aug 16, 2023 at 3:44 PM DSpace Community <
> [email protected]> wrote:
>
>> Hi Chandrika,
>>
>> DSpace does not have an OCR engine.  It is only able to index PDFs (or
>> other electronic files) if they have been previously OCR'ed by a different
>> system.
>>
>> Tim
>>
>> On Tuesday, August 8, 2023 at 11:03:27 AM UTC-5 [email protected]
>> wrote:
>>
>>> Hi Team,
>>>
>>> Would like to know about the underlying OCR engine used in DSpace.
>>> Please share if there is any documentation around the same.
>>>
>>> Regards,
>>> Chandrika Hebbar
>>>
>> --
>> All messages to this mailing list should adhere to the Code of Conduct:
>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "DSpace Community" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/dspace-community/177ae4c4-a59f-4bdb-af87-0e2ce03e1582n%40googlegroups.com
>> <https://groups.google.com/d/msgid/dspace-community/177ae4c4-a59f-4bdb-af87-0e2ce03e1582n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-community/CAKZKP2BmUT5%2BisTZq4FxZ9OO%2BXN3g9-Fuisgz6VORYyaof9A5w%40mail.gmail.com
> <https://groups.google.com/d/msgid/dspace-community/CAKZKP2BmUT5%2BisTZq4FxZ9OO%2BXN3g9-Fuisgz6VORYyaof9A5w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/CAAf%2BQRYO%2BR8_19hOByjcVCV-5CNS8LyF_ZTsikO55QNaqB9T3g%40mail.gmail.com.

Reply via email to