Hi OScar,

if I'm not wrong, full text search on PDFs should be enabled by default if you have configured your DSpace instance to run regularly the media filters (see https://wiki.lyrasis.org/display/DSDOC8x/Scheduled+Tasks+via+Cron, it is referenced in step 15 in the Installation guid of the backend):

https://wiki.lyrasis.org/display/DSDOC8x/Mediafilters+for+Transforming+DSpace+Content

The documentation says explicitly that OCRed documents should work using the "PDF Text Extractor".

Cheers,

Abel


El 14/10/2024 a las 15:56, Oscar Orrego escribió:
Hello Diogenes:
Thank you very much for answering. I already have the files uploaded in PDF OCR already applied. What I need is to be able to search by words within the uploaded OCR file (items). For example, if within the file there is a certain Name "JUAN" you can find it outside the metadata previously entered. Yes within the content of the uploaded OCR file.
Thank you so much
Oscar

El sáb, 12 oct 2024 a la(s) 11:29 a.m., Job Diogenes Ribeiro Borges ([email protected]) escribió:

    Hola Oscar,

    I din't know if there's some specific DSpace settings to do this.
    But, since, Dspace use Apache SORL for indexing, then this could
    be achieved.
    Look in Google for "SORL OCR PDF indexing"

    https://opensemanticsearch.org/doc/admin/config/ocr/
    <http://OCR%20PDF%20Indexing>

    Cheers
    Em sexta-feira, 4 de outubro de 2024 às 11:59:34 UTC-3, Oscar
    Orrego escreveu:

        Hola tod@s
        Tenemos instalados Dspace 9 en un servidor de los datos y
        queremos levantar para digitalizar la biblioteca de la
        institución donde trabajo, en las pruebas basicas que
        realizamos podemos buscar por la metadata, no asi por el
        CONTENIDO del documento que los usuarios necesitarn buscar
        palabras descartar otras y demas
        Existe alguna configuracion para que indexe por el contenido
        de cada documento PDF subido con OCR para la busqueda por
        texto completo
        Muchas Gracias
        Oscar Orrego

-- All messages to this mailing list should adhere to the Code of
    Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
    ---
    You received this message because you are subscribed to the Google
    Groups "DSpace Community" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected].
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/dspace-community/b026517e-f77c-4386-a138-799328f08b29n%40googlegroups.com
    
<https://groups.google.com/d/msgid/dspace-community/b026517e-f77c-4386-a138-799328f08b29n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/CAEjpp-OmLH3PyKWbTPnO_v8jUUiu2iqBp8GfFJpnX5O1q4%2BzGg%40mail.gmail.com <https://groups.google.com/d/msgid/dspace-community/CAEjpp-OmLH3PyKWbTPnO_v8jUUiu2iqBp8GfFJpnX5O1q4%2BzGg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
Abel Gómez Llana, PhD

[email protected]
https://abel.gomez.llana.me

--
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/94dec58e-8c1a-4ec6-8fb8-a27d94c3fe19%40gmail.com.

Reply via email to