Hi OScar,
if I'm not wrong, full text search on PDFs should be enabled by default
if you have configured your DSpace instance to run regularly the media
filters (see
https://wiki.lyrasis.org/display/DSDOC8x/Scheduled+Tasks+via+Cron, it is
referenced in step 15 in the Installation guid of the backend):
https://wiki.lyrasis.org/display/DSDOC8x/Mediafilters+for+Transforming+DSpace+Content
The documentation says explicitly that OCRed documents should work using
the "PDF Text Extractor".
Cheers,
Abel
El 14/10/2024 a las 15:56, Oscar Orrego escribió:
Hello Diogenes:
Thank you very much for answering. I already have the files uploaded
in PDF OCR already applied. What I need is to be able to search by
words within the uploaded OCR file (items). For example, if within the
file there is a certain Name "JUAN" you can find it outside the
metadata previously entered. Yes within the content of the uploaded
OCR file.
Thank you so much
Oscar
El sáb, 12 oct 2024 a la(s) 11:29 a.m., Job Diogenes Ribeiro Borges
([email protected]) escribió:
Hola Oscar,
I din't know if there's some specific DSpace settings to do this.
But, since, Dspace use Apache SORL for indexing, then this could
be achieved.
Look in Google for "SORL OCR PDF indexing"
https://opensemanticsearch.org/doc/admin/config/ocr/
<http://OCR%20PDF%20Indexing>
Cheers
Em sexta-feira, 4 de outubro de 2024 às 11:59:34 UTC-3, Oscar
Orrego escreveu:
Hola tod@s
Tenemos instalados Dspace 9 en un servidor de los datos y
queremos levantar para digitalizar la biblioteca de la
institución donde trabajo, en las pruebas basicas que
realizamos podemos buscar por la metadata, no asi por el
CONTENIDO del documento que los usuarios necesitarn buscar
palabras descartar otras y demas
Existe alguna configuracion para que indexe por el contenido
de cada documento PDF subido con OCR para la busqueda por
texto completo
Muchas Gracias
Oscar Orrego
--
All messages to this mailing list should adhere to the Code of
Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google
Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/dspace-community/b026517e-f77c-4386-a138-799328f08b29n%40googlegroups.com
<https://groups.google.com/d/msgid/dspace-community/b026517e-f77c-4386-a138-799328f08b29n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
All messages to this mailing list should adhere to the Code of
Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google
Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/dspace-community/CAEjpp-OmLH3PyKWbTPnO_v8jUUiu2iqBp8GfFJpnX5O1q4%2BzGg%40mail.gmail.com
<https://groups.google.com/d/msgid/dspace-community/CAEjpp-OmLH3PyKWbTPnO_v8jUUiu2iqBp8GfFJpnX5O1q4%2BzGg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
Abel Gómez Llana, PhD
[email protected]
https://abel.gomez.llana.me
--
All messages to this mailing list should adhere to the Code of Conduct:
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/dspace-community/94dec58e-8c1a-4ec6-8fb8-a27d94c3fe19%40gmail.com.