Re: [Dspace-tech] Google crawler repeatedly requests non-existent handles...
Dear Panyarak, Check ur sitemaps file. Find out the non existing pages and delete the URLs. Plus in robots.txt file you have to give the location of the sitemaps.xml file. Regards Vinit Kumar Senior Research Fellow Documentation Research and Training Centre Bangalore MLISc (BHU) Varanasi, India Alt email: vi...@drtc.isibang.ac.in On 21 September 2010 07:04, Panyarak Ngamsritragul pa...@me.psu.ac.thwrote: Hi, There are 2 points here: 1. In our repository, we have configured to allow crawler to browser our site by putting a robot.txt with only one line : User-agent: * I have checked with webmaster tools and it reports that the crawler access was success. Anyway, I am not quite sure that should be OK. The problem is that internal error messages are being sent to me everyday saying that the crawler cannot access certain pages. I have checked the handles attached and found that those are non-existent pages... Can any of you please suggest what I should do to get rid of this kind of errors ? 2. I also submited sitemaps to Google, the latest result reported in webmaster tools is: Sitemap: http://kb.psu.ac.th/psukb/sitemap Status: OK Type: Index Submitted: 17/7/2010 Downloaded:17/9/2010 URLs submitted: 4,545 URLs in web index: 3,785 Should I stop the crawler as mentioned in 1? and what happened to the URLs which reported as not in web index? Thanks. Panyarak Ngamsritragul Khunying Long Athakravisunthorn Learning Resources Center Prince of Songkla University Hat Yai, Songkhla, Thailand -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Google crawler repeatedly requests non-existent handles...
Thanks Vinit for the information. I checked the sitemap files under DSPACE/sitemaps and found that the handles Google crawler keep on accessing do not exist in any of the files there. Or there are sitemap files elsewhere? How do I include the sitemap files in robot.txt? Sorry if this is a stupid question. Panyarak Ngamsritragul Khunying Long Athakravisunthorn Learning Resources Center Prince of Songkla University Hat Yai, Songkhla, Thailand Dear Panyarak, Check ur sitemaps file. Find out the non existing pages and delete the URLs. Plus in robots.txt file you have to give the location of the sitemaps.xml file. Regards Vinit Kumar Senior Research Fellow Documentation Research and Training Centre Bangalore MLISc (BHU) Varanasi, India Alt email: vi...@drtc.isibang.ac.in -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
[Dspace-tech] Google crawler repeatedly requests non-existent handles...
Hi, There are 2 points here: 1. In our repository, we have configured to allow crawler to browser our site by putting a robot.txt with only one line : User-agent: * I have checked with webmaster tools and it reports that the crawler access was success. Anyway, I am not quite sure that should be OK. The problem is that internal error messages are being sent to me everyday saying that the crawler cannot access certain pages. I have checked the handles attached and found that those are non-existent pages... Can any of you please suggest what I should do to get rid of this kind of errors ? 2. I also submited sitemaps to Google, the latest result reported in webmaster tools is: Sitemap: http://kb.psu.ac.th/psukb/sitemap Status: OK Type: Index Submitted: 17/7/2010 Downloaded:17/9/2010 URLs submitted: 4,545 URLs in web index: 3,785 Should I stop the crawler as mentioned in 1? and what happened to the URLs which reported as not in web index? Thanks. Panyarak Ngamsritragul Khunying Long Athakravisunthorn Learning Resources Center Prince of Songkla University Hat Yai, Songkhla, Thailand -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech