https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=33317
M <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] | |om --- Comment #27 from M <[email protected]> --- Wait, so this simply adds the same robots tag for the entirety of OPAC? I've been redirected here from Bug 35812 due to conflicts. And I think this bug right here is questionable and I'm not sure if it should be merged, at least as-is. > Websites must have a robots meta tag That is not true, this tag is very much optional and meant for granular page-level steering of crawling bots. The way this preference is implemented, the rules are going to apply to ALL opac pages, which I'm not sure if there's any reasonable use-case. The author mentions example "noindex,nofollow" to prevent ALL opac pages from being indexed. I think that if some library wants that, they'd be better off using the more widely used and known robots.txt file, which is more likely to be supported by various crawlers, and will prevent them from downloading the pages in the first place (instead of downloading the pages it wanted to and then discarding them upon discovering the meta tag for particular page). I think the better direction is to diversify manually which pages should be crawlable by default and which shouldn't like in Bug 35812, ie. search results and so on (dynamic pages) shouldn't be indexed to decreate the amount of junk (but they should be crawled to extract links from it), while main page/info subpages/user-created lists/biblio records should probably be indexed by default and it's probably what most libraries would want by default. With that said, there currently there's no "obvious"/"easy" way of specifying custom robots.txt file, apart from doing something like `Alias /robots.txt /var/www/html/robots.txt` in Apache config for opac (it works well enough btw). So, in the spirit of what OP originally wanted, I believe it could be better to consider instead adding a user preference for textarea of robots.txt file contents, in place of site-wide robots meta tag contents. This would allow libraries to set more granular rules, but someone who wants to block everything could still just do: User-agent: * Disallow: / Btw this is already documented in README.robots file in Koha's main git directory (last edited 13 years ago, the last paragraph there is probably outdated). So I believe my idea above could solve the conflict between our two patches, robots.txt usage is more widely documented on the Internet I believe, and that'd override any rules that Koha devs could specify on per-template basis manually in robots tag on pages like I did in Bug 35812... -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
