It -should- be using HEAD to see if the timestamps have changed---not downloading everything repeatedly!
I found that Baidu was so badly-behaved on a site I run that I just barred it completely, by telling Apache not to serve anything with that user-agent string. Good riddance. (The site is for a makerspace in the US, which requires physical presence to use, so I frankly don't care if a few Chinese searchers don't know we exist or have to use someone else's search engine to figure it out. If they care, they should fix their damned search engines not to be so obnoxious.) I don't know if Baidu actually pays any attention to robots.txt. (I can't remember if this was the reason I barred it, besides its ridiculously high load.) But if it does, you can disallow just that one search engine from scanning your large content. _______________________________________________ Dirvish mailing list [email protected] http://www.dirvish.org/mailman/listinfo/dirvish
