I should point out that my robots.txt suggestions assume you don't want any
stats pages crawled at all... if that's not true, it's probably best to
apply the patch for DS-689 and wait for Google to de-index (and make the
robots.txt entries more specific if there are only a few invalid handles
being requested)

Cheers,

Kim

On 6 October 2010 00:30, Kim Shepherd <kim.sheph...@gmail.com> wrote:

> Hi Panyarak,
>
> It might be an idea to add /displaystats to your JSPUI's robots.txt and to
> any Google Webmaster Tools robots.txt files or Page Removal Requests.
> For Google to de-index pages, it generally likes to see a 404 (not found)
> or a 410 (gone).
>
> Unfortunately, the servlet that handles statistics display for JSPUI throws
> a NullPointerException when a handle is passed to it that doesn't turn into
> a valid DSpace object. It *should* throw a friendly 404 to help crawlers
> like Google realise the page is gone.
>
> I've opened a JIRA issue for the NPE bug -
> http://jira.dspace.org/jira/browse/DS-689 - and attached a patch for 1.6.2
> (and trunk, and probably other 1.6.x versions) that will make sure that when
> anyone (including Google) visits those pages, it sees a 404 instead of
> "Internal Server Error".
>
> Hopefully this, along with /displaystats (and/or /displaystats* ?) in your
> robots.txts or removal requests will help convince Google to stop crawling.
>
> Cheers,
>
> Kim
>
> On 4 October 2010 13:52, Panyarak Ngamsritragul <pa...@me.psu.ac.th>wrote:
>
>>
>> Dear all,
>>
>> A couple of weeks ago I have posted questions about Google crawler and
>> sitemaps.  There was a response from Vinit, but I still could not reach
>> the solution to what I am experiencing.
>>
>> I am running 1.6.2 and have registered the site (kb.psu.ac.th) to
>> Google's
>> webmaster tools.  I understand that I have submitted the sitemaps.  After
>> sometimes, I have repeatedly receiving Internal server error as a result
>> of Google crawler trying to access some non-existence records.  Some of
>> the records were repeatedly accessed by crawler for more than a month now.
>>
>> Can anyone help me to pin point the root cause of the problem?
>> I have attaced here with one of the error messages.
>>
>> Thanks.
>>
>> Panyarak Ngamsritragul
>> Prince of Songkla University.
>> Thailand.
>>
>> ---------- Forwarded message ----------
>> Date: Sun, 3 Oct 2010 18:50:06 +0700 (ICT)
>> From: psukb-nore...@psu.ac.th
>> To: psukb-h...@me.psu.ac.th
>> Subject: PSUKB: Internal Server Error
>>
>> An internal server error occurred on http://kb.psu.ac.th/psukb:
>>
>> Date:       10/3/10 6:50 PM
>> Session ID: D5E58233D9F2093B248C4CC5C65D96D1
>> User:       Anonymous
>> IP address: 66.249.69.1
>>
>> -- URL Was: http://kb.psu.ac.th:8080/psukb/displaystats?handle=2553/929
>> -- Method: GET
>> -- Parameters were:
>> -- handle: "2553/929"
>>
>> Exception:
>> java.lang.NullPointerException
>>        at
>> org.dspace.app.webui.servlet.DisplayStatisticsServlet.displayStatistics(DisplayStatisticsServlet.java:182)
>>        at
>> org.dspace.app.webui.servlet.DisplayStatisticsServlet.doDSGet(DisplayStatisticsServlet.java:123)
>>        at
>> org.dspace.app.webui.servlet.DSpaceServlet.processRequest(DSpaceServlet.java:151)
>>        at
>> org.dspace.app.webui.servlet.DSpaceServlet.doGet(DSpaceServlet.java:99)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
>>        at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
>>        at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at
>> org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:112)
>>        at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>        at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at
>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
>>        at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>        at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>        at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>>        at
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>>        at
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>>        at
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>>        at java.lang.Thread.run(Thread.java:619)
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Virtualization is moving to the mainstream and overtaking non-virtualized
>> environment for deploying applications. Does it make network security
>> easier or more difficult to achieve? Read this whitepaper to separate the
>> two and get a better understanding.
>> http://p.sf.net/sfu/hp-phase2-d2d
>> _______________________________________________
>> DSpace-tech mailing list
>> DSpace-tech@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>
>
>
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to