Hi Tim, thank you for your answer. All PDF (99 % for sure) is digital born files, so this isn't the problem. Site with SEO i know but didn't help...
Dne pátek 24. září 2021 v 17:09:05 UTC+2 uživatel Tim Donohue napsal: > Hi Jan, > > If the record is being indexed by Google already, then they should be > aware of the PDF already, and there's not much DSpace can do to force > Google to full text index the PDF. That said, it's worth noting there are > two main types of PDFs, and only one of which is easily indexed: > > - PDFs created from digital files or OCRed images. These PDFs have > embedded text and are more easily full text indexed. > - PDFs created from scanned files (without OCR). These are image-based > PDFs with no embedded text, and they are often *not able to be full > text indexed*, unless the system which grabs the PDF is able to OCR > it reliably in an automatic fashion. > > So, if the PDFs you are talking about were created from scanned images, *then > make sure to OCR them so that they are easier to index.* > > DSpace provides some other hints/tips about Search Engine Optimization > here which you may want to review for your repository: > https://wiki.lyrasis.org/display/DSDOC5x/Search+Engine+Optimization > > If you have other questions let us know on this list. > > Tim > > ------------------------------ > *From:* [email protected] <[email protected]> on > behalf of Jan Skůpa <[email protected]> > *Sent:* Friday, September 24, 2021 2:53 AM > *To:* DSpace Community <[email protected]> > *Subject:* [dspace-community] fulltext indexing PDF files in Google search > > Hi, > I found that most of the PDFs in our dspace (5.3) are not fully searchable > via Google. The records are indexed, but the phrases from the PDF are not > found. Is it possible that there is a bug in the settings somewhere? Should > this work? Thanks! > > -- > All messages to this mailing list should adhere to the Code of Conduct: > https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx > --- > You received this message because you are subscribed to the Google Groups > "DSpace Community" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/dspace-community/c3b24342-a0ef-4946-9576-6ae2b32c55ffn%40googlegroups.com > > <https://groups.google.com/d/msgid/dspace-community/c3b24342-a0ef-4946-9576-6ae2b32c55ffn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/356532f9-c321-423f-81f9-23e18fc67f5dn%40googlegroups.com.
