Hi Jan, If the record is being indexed by Google already, then they should be aware of the PDF already, and there's not much DSpace can do to force Google to full text index the PDF. That said, it's worth noting there are two main types of PDFs, and only one of which is easily indexed:
* PDFs created from digital files or OCRed images. These PDFs have embedded text and are more easily full text indexed. * PDFs created from scanned files (without OCR). These are image-based PDFs with no embedded text, and they are often not able to be full text indexed, unless the system which grabs the PDF is able to OCR it reliably in an automatic fashion. So, if the PDFs you are talking about were created from scanned images, then make sure to OCR them so that they are easier to index. DSpace provides some other hints/tips about Search Engine Optimization here which you may want to review for your repository: https://wiki.lyrasis.org/display/DSDOC5x/Search+Engine+Optimization If you have other questions let us know on this list. Tim ________________________________ From: [email protected] <[email protected]> on behalf of Jan Skůpa <[email protected]> Sent: Friday, September 24, 2021 2:53 AM To: DSpace Community <[email protected]> Subject: [dspace-community] fulltext indexing PDF files in Google search Hi, I found that most of the PDFs in our dspace (5.3) are not fully searchable via Google. The records are indexed, but the phrases from the PDF are not found. Is it possible that there is a bug in the settings somewhere? Should this work? Thanks! -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/c3b24342-a0ef-4946-9576-6ae2b32c55ffn%40googlegroups.com<https://groups.google.com/d/msgid/dspace-community/c3b24342-a0ef-4946-9576-6ae2b32c55ffn%40googlegroups.com?utm_medium=email&utm_source=footer>. -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/DM5PR2201MB1148C43C5F25A288F74EAF35EDA49%40DM5PR2201MB1148.namprd22.prod.outlook.com.
