Re: [Dspace-tech] Google crawler repeatedly requests non-existent handles...

2010-09-23 Thread Vinit
Dear Panyarak,

Check ur sitemaps file. Find out the non existing pages and delete the URLs.


Plus in robots.txt file you have to give the location of the sitemaps.xml
file.

Regards
Vinit Kumar
Senior Research Fellow
Documentation Research and Training Centre
Bangalore
MLISc   (BHU)
Varanasi, India
Alt email: vi...@drtc.isibang.ac.in



On 21 September 2010 07:04, Panyarak Ngamsritragul pa...@me.psu.ac.thwrote:


 Hi,

 There are 2 points here:
 1. In our repository, we have configured to allow crawler to browser our
 site by putting a robot.txt with only one line :
 User-agent: *
 I have checked with webmaster tools and it reports that the crawler access
 was success.  Anyway, I am not quite sure that should be OK.  The problem
 is that internal error messages are being sent to me everyday saying that
 the crawler cannot access certain pages.  I have checked the handles
 attached and found that those are non-existent pages...  Can any of you
 please suggest what I should do to get rid of this kind of errors ?

 2. I also submited sitemaps to Google, the latest result reported in
 webmaster tools is:
   Sitemap: http://kb.psu.ac.th/psukb/sitemap
   Status: OK
   Type: Index
   Submitted: 17/7/2010
   Downloaded:17/9/2010
   URLs submitted: 4,545
   URLs in web index: 3,785

 Should I stop the crawler as mentioned in 1? and what happened to the
 URLs which reported as not in web index?

 Thanks.

 Panyarak Ngamsritragul
 Khunying Long Athakravisunthorn Learning Resources Center
 Prince of Songkla University
 Hat Yai, Songkhla, Thailand

 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.



 --
 Start uncovering the many advantages of virtual appliances
 and start using them to simplify application deployment and
 accelerate your shift to cloud computing.
 http://p.sf.net/sfu/novell-sfdev2dev
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Google crawler repeatedly requests non-existent handles...

2010-09-23 Thread Panyarak Ngamsritragul


Thanks Vinit for the information.
I checked the sitemap files under DSPACE/sitemaps and found that the 
handles Google crawler keep on accessing do not exist in any of the files 
there.  Or there are sitemap files elsewhere?


How do I include the sitemap files in robot.txt?  Sorry if this is a 
stupid question.


Panyarak Ngamsritragul
Khunying Long Athakravisunthorn Learning Resources Center
Prince of Songkla University
Hat Yai, Songkhla, Thailand


Dear Panyarak,

Check ur sitemaps file. Find out the non existing pages and delete the URLs.

Plus in robots.txt file you have to give the location of the sitemaps.xml file.

Regards
Vinit Kumar 
Senior Research Fellow
Documentation Research and Training Centre
Bangalore
MLISc   (BHU)
Varanasi, India 
Alt email: vi...@drtc.isibang.ac.in

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Google crawler repeatedly requests non-existent handles...

2010-09-20 Thread Panyarak Ngamsritragul

Hi,

There are 2 points here:
1. In our repository, we have configured to allow crawler to browser our 
site by putting a robot.txt with only one line :
User-agent: *
I have checked with webmaster tools and it reports that the crawler access 
was success.  Anyway, I am not quite sure that should be OK.  The problem 
is that internal error messages are being sent to me everyday saying that 
the crawler cannot access certain pages.  I have checked the handles 
attached and found that those are non-existent pages...  Can any of you 
please suggest what I should do to get rid of this kind of errors ?

2. I also submited sitemaps to Google, the latest result reported in 
webmaster tools is:
   Sitemap: http://kb.psu.ac.th/psukb/sitemap
   Status: OK
   Type: Index
   Submitted: 17/7/2010
   Downloaded:17/9/2010
   URLs submitted: 4,545
   URLs in web index: 3,785

Should I stop the crawler as mentioned in 1? and what happened to the
URLs which reported as not in web index?

Thanks.

Panyarak Ngamsritragul
Khunying Long Athakravisunthorn Learning Resources Center
Prince of Songkla University
Hat Yai, Songkhla, Thailand

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech