Hi Tim,
thank you for your answer.
All PDF (99 % for sure) is digital born files, so this isn't the problem. 
Site with SEO i know but didn't help...

Dne pátek 24. září 2021 v 17:09:05 UTC+2 uživatel Tim Donohue napsal:

> Hi Jan,
>
> If the record is being indexed by Google already, then they should be 
> aware of the PDF already, and there's not much DSpace can do to force 
> Google to full text index the PDF.  That said, it's worth noting there are 
> two main types of PDFs, and only one of which is easily indexed:
>
>    - PDFs created from digital files or OCRed images.  These PDFs have 
>    embedded text and are more easily full text indexed.
>    - PDFs created from scanned files (without OCR). These are image-based 
>    PDFs with no embedded text, and they are often *not able to be full 
>    text indexed​*​, unless the system which grabs the PDF is able to OCR 
>    it reliably in an automatic fashion.
>
> So, if the PDFs you are talking about were created from scanned images, *then 
> make sure to OCR them so that they are easier to index.*
>
> DSpace provides some other hints/tips about Search Engine Optimization 
> here which you may want to review for your repository: 
> https://wiki.lyrasis.org/display/DSDOC5x/Search+Engine+Optimization
>
> If you have other questions let us know on this list.
>
> Tim
>
> ------------------------------
> *From:* [email protected] <[email protected]> on 
> behalf of Jan Skůpa <[email protected]>
> *Sent:* Friday, September 24, 2021 2:53 AM
> *To:* DSpace Community <[email protected]>
> *Subject:* [dspace-community] fulltext indexing PDF files in Google search 
>  
> Hi,
> I found that most of the PDFs in our dspace (5.3) are not fully searchable 
> via Google. The records are indexed, but the phrases from the PDF are not 
> found. Is it possible that there is a bug in the settings somewhere? Should 
> this work? Thanks!
>
> -- 
> All messages to this mailing list should adhere to the Code of Conduct: 
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> --- 
> You received this message because you are subscribed to the Google Groups 
> "DSpace Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/dspace-community/c3b24342-a0ef-4946-9576-6ae2b32c55ffn%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/dspace-community/c3b24342-a0ef-4946-9576-6ae2b32c55ffn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/356532f9-c321-423f-81f9-23e18fc67f5dn%40googlegroups.com.

Reply via email to