On Mon, 18 May 2020 15:44:42 +0100
Rory O'Farrell <ofarr...@iol.ie> wrote:

> On Tue, 12 May 2020 17:41:09 +0200
> Peter Kovacs <pe...@apache.org> wrote:
> 
> > Okay, I had a short debug session with Dave and Humbedooh.
> > 
> > We are now sure that the crawlers are not blocked. The 301 Response 
> > comes from the fact that Yandex still defaults to http and not https.
> 
> 
> This post on User Forum might be relevant
> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756
> 
> Rory

More detailed examination today shows that
Google search in French seems to drop out six days ago, in Italian five days 
ago, and in English about 23rd April - try a search for openoffice and the site 
specifier

See the above URL for details.

Rory


> > 
> > After I added https toi the URL all worked fine.
> > 
> > Wave did also do a curl request which also worked fine.
> > 
> > 
> > We have agreed now that I play the ball back to google, with the 
> > feedback that this looks like a Google internal issue.
> > 
> > The Robot.txt has not been changed for 11 years. Yandex can crawl the 
> > URL and we can curl the Webpage. So we think it is an Google Issue.
> > 
> > 
> > I very much appreciated the quick session. Thanks.
> > 
> > 
> > all the Best
> > 
> > Peter
> > 
> > Am 12.05.20 um 17:24 schrieb Dave Fisher:
> > > It’s not an IP Ban. Infra tells me that would not be a 301.
> > >
> > > Ah-ha - here is the 301:
> > >
> > > % curl -D headers http://forum.openoffice.org/
> > > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> > > <html><head>
> > > <title>301 Moved Permanently</title>
> > > </head><body>
> > > <h1>Moved Permanently</h1>
> > > <p>The document has moved <a 
> > > href="https://forum.openoffice.org/";>here</a>.</p>
> > > </body></html>
> > >
> > > Surprising that they cannot shift from HTTP to HTTPS via a 301!
> > >
> > > Regards,
> > > Dave
> > >
> > >> On May 12, 2020, at 8:04 AM, Dave Fisher <w...@apache.org> wrote:
> > >>
> > >> Information about Infra IP Bans is here: 
> > >> https://infra.apache.org/infra-ban.html
> > >>
> > >> Please direct the Google engineer to that resource.
> > >>
> > >> Regards,
> > >> Dave
> > >>
> > >>> On May 12, 2020, at 7:55 AM, Dave Fisher <w...@apache.org> wrote:
> > >>>
> > >>> Are you sure you weren’t using forums.openoffice.org instead of 
> > >>> forum.openoffice.org?
> > >>>
> > >>> curl -D headers https://forum.openoffice.org/ does return the correct 
> > >>> page.
> > >>>
> > >>> The robots.txt is this:
> > >>>
> > >>> curl -D headers https://forum.openoffice.org/robots.txt
> > >>> User-agent: *
> > >>> Crawl-delay: 1
> > >>> Disallow: /en/forum/common.php
> > >>> Disallow: /en/forum/config.php
> > >>> Disallow: /en/forum/con.php
> > >>> Disallow: /en/forum/faq.php
> > >>> Disallow: /en/forum/mcp.php
> > >>> Disallow: /en/forum/memberlist.php
> > >>> Disallow: /en/forum/posting.php
> > >>> Disallow: /en/forum/report.php
> > >>> Disallow: /en/forum/search.php
> > >>> Disallow: /en/forum/style.php
> > >>> Disallow: /en/forum/ucp.php
> > >>> Disallow: /en/forum/viewonline.php
> > >>> Disallow: /en/forum/adm
> > >>> Disallow: /en/forum/cache
> > >>> Disallow: /en/forum/docs
> > >>> Disallow: /en/forum/files
> > >>> Disallow: /en/forum/images
> > >>> Disallow: /en/forum/includes
> > >>> Disallow: /en/forum/language
> > >>> Disallow: /en/forum/store
> > >>> Disallow: /en/forum/styles
> > >>> Disallow: /es/forum/common.php
> > >>> Disallow: /es/forum/config.php
> > >>> Disallow: /es/forum/con.php
> > >>> Disallow: /es/forum/faq.php
> > >>> Disallow: /es/forum/mcp.php
> > >>> Disallow: /es/forum/memberlist.php
> > >>> Disallow: /es/forum/posting.php
> > >>> Disallow: /es/forum/report.php
> > >>> Disallow: /es/forum/search.php
> > >>> Disallow: /es/forum/style.php
> > >>> Disallow: /es/forum/ucp.php
> > >>> Disallow: /es/forum/viewonline.php
> > >>> Disallow: /es/forum/adm
> > >>> Disallow: /es/forum/cache
> > >>> Disallow: /es/forum/docs
> > >>> Disallow: /es/forum/files
> > >>> Disallow: /es/forum/images
> > >>> Disallow: /es/forum/includes
> > >>> Disallow: /es/forum/language
> > >>> Disallow: /es/forum/store
> > >>> Disallow: /es/forum/styles
> > >>> Disallow: /fr/forum/common.php
> > >>> Disallow: /fr/forum/config.php
> > >>> Disallow: /fr/forum/con.php
> > >>> Disallow: /fr/forum/faq.php
> > >>> Disallow: /fr/forum/mcp.php
> > >>> Disallow: /fr/forum/memberlist.php
> > >>> Disallow: /fr/forum/posting.php
> > >>> Disallow: /fr/forum/report.php
> > >>> Disallow: /fr/forum/search.php
> > >>> Disallow: /fr/forum/style.php
> > >>> Disallow: /fr/forum/ucp.php
> > >>> Disallow: /fr/forum/viewonline.php
> > >>> Disallow: /fr/forum/adm
> > >>> Disallow: /fr/forum/cache
> > >>> Disallow: /fr/forum/docs
> > >>> Disallow: /fr/forum/files
> > >>> Disallow: /fr/forum/images
> > >>> Disallow: /fr/forum/includes
> > >>> Disallow: /fr/forum/language
> > >>> Disallow: /fr/forum/store
> > >>> Disallow: /fr/forum/styles
> > >>> Disallow: /fr/ci-joint
> > >>> Disallow: /hu/forum/common.php
> > >>> Disallow: /hu/forum/config.php
> > >>> Disallow: /hu/forum/con.php
> > >>> Disallow: /hu/forum/faq.php
> > >>> Disallow: /hu/forum/mcp.php
> > >>> Disallow: /hu/forum/memberlist.php
> > >>> Disallow: /hu/forum/posting.php
> > >>> Disallow: /hu/forum/report.php
> > >>> Disallow: /hu/forum/search.php
> > >>> Disallow: /hu/forum/style.php
> > >>> Disallow: /hu/forum/ucp.php
> > >>> Disallow: /hu/forum/viewonline.php
> > >>> Disallow: /hu/forum/adm
> > >>> Disallow: /hu/forum/cache
> > >>> Disallow: /hu/forum/docs
> > >>> Disallow: /hu/forum/files
> > >>> Disallow: /hu/forum/images
> > >>> Disallow: /hu/forum/includes
> > >>> Disallow: /hu/forum/language
> > >>> Disallow: /hu/forum/store
> > >>> Disallow: /hu/forum/styles
> > >>> Disallow: /ja/forum/common.php
> > >>> Disallow: /ja/forum/config.php
> > >>> Disallow: /ja/forum/con.php
> > >>> Disallow: /ja/forum/faq.php
> > >>> Disallow: /ja/forum/mcp.php
> > >>> Disallow: /ja/forum/memberlist.php
> > >>> Disallow: /ja/forum/posting.php
> > >>> Disallow: /ja/forum/report.php
> > >>> Disallow: /ja/forum/search.php
> > >>> Disallow: /ja/forum/style.php
> > >>> Disallow: /ja/forum/ucp.php
> > >>> Disallow: /ja/forum/viewonline.php
> > >>> Disallow: /ja/forum/adm
> > >>> Disallow: /ja/forum/cache
> > >>> Disallow: /ja/forum/docs
> > >>> Disallow: /ja/forum/files
> > >>> Disallow: /ja/forum/images
> > >>> Disallow: /ja/forum/includes
> > >>> Disallow: /ja/forum/language
> > >>> Disallow: /ja/forum/store
> > >>> Disallow: /ja/forum/styles
> > >>> Disallow: /test
> > >>> Disallow: /nl/forum/common.php
> > >>> Disallow: /nl/forum/config.php
> > >>> Disallow: /nl/forum/con.php
> > >>> Disallow: /nl/forum/faq.php
> > >>> Disallow: /nl/forum/mcp.php
> > >>> Disallow: /nl/forum/memberlist.php
> > >>> Disallow: /nl/forum/posting.php
> > >>> Disallow: /nl/forum/report.php
> > >>> Disallow: /nl/forum/search.php
> > >>> Disallow: /nl/forum/style.php
> > >>> Disallow: /nl/forum/ucp.php
> > >>> Disallow: /nl/forum/viewonline.php
> > >>> Disallow: /nl/forum/adm
> > >>> Disallow: /nl/forum/cache
> > >>> Disallow: /nl/forum/docs
> > >>> Disallow: /nl/forum/files
> > >>> Disallow: /nl/forum/images
> > >>> Disallow: /nl/forum/includes
> > >>> Disallow: /nl/forum/language
> > >>> Disallow: /nl/forum/store
> > >>> Disallow: /nl/forum/styles
> > >>> Disallow: /vi/forum/common.php
> > >>> Disallow: /vi/forum/config.php
> > >>> Disallow: /vi/forum/con.php
> > >>> Disallow: /vi/forum/faq.php
> > >>> Disallow: /vi/forum/mcp.php
> > >>> Disallow: /vi/forum/memberlist.php
> > >>> Disallow: /vi/forum/posting.php
> > >>> Disallow: /vi/forum/report.php
> > >>> Disallow: /vi/forum/search.php
> > >>> Disallow: /vi/forum/style.php
> > >>> Disallow: /vi/forum/ucp.php
> > >>> Disallow: /vi/forum/viewonline.php
> > >>> Disallow: /vi/forum/adm
> > >>> Disallow: /vi/forum/cache
> > >>> Disallow: /vi/forum/docs
> > >>> Disallow: /vi/forum/files
> > >>> Disallow: /vi/forum/images
> > >>> Disallow: /vi/forum/includes
> > >>> Disallow: /vi/forum/language
> > >>> Disallow: /vi/forum/store
> > >>> Disallow: /vi/forum/styles
> > >>> Disallow: /zh/forum/common.php
> > >>> Disallow: /zh/forum/config.php
> > >>> Disallow: /zh/forum/con.php
> > >>> Disallow: /zh/forum/faq.php
> > >>> Disallow: /zh/forum/mcp.php
> > >>> Disallow: /zh/forum/memberlist.php
> > >>> Disallow: /zh/forum/posting.php
> > >>> Disallow: /zh/forum/report.php
> > >>> Disallow: /zh/forum/search.php
> > >>> Disallow: /zh/forum/style.php
> > >>> Disallow: /zh/forum/ucp.php
> > >>> Disallow: /zh/forum/viewonline.php
> > >>> Disallow: /zh/forum/adm
> > >>> Disallow: /zh/forum/cache
> > >>> Disallow: /zh/forum/docs
> > >>> Disallow: /zh/forum/files
> > >>> Disallow: /zh/forum/images
> > >>> Disallow: /zh/forum/includes
> > >>> Disallow: /zh/forum/language
> > >>> Disallow: /zh/forum/store
> > >>> Disallow: /zh/forum/styles
> > >>>
> > >>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 
> > >>> 2009 23:40:14 GMT
> > >>>
> > >>> Forum search uses phpBB
> > >>>
> > >>> We haven’t allowed search engines to crawl forum.openoffice.org since 
> > >>> before the Oracle donation to the ASF.
> > >>>
> > >>> Crawlers IP addresses might be blocked by ASF Infra if their use is 
> > >>> excessive. That could give the 301.
> > >>>
> > >>> Regards,
> > >>> Dave
> > >>>
> > >>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <leg...@posteo.de> wrote:
> > >>>>
> > >>>> Hello all,
> > >>>>
> > >>>>
> > >>>> What I figured is that from the Google search tool the URL 
> > >>>> forum.openoffice.org is not reachable.
> > >>>>
> > >>>> So I checked with Duckduckgo (my prefered Search engine), they don't 
> > >>>> use crawler and point at the infra of Google, Bing and Yandex.
> > >>>>
> > >>>> I checked then with Bing, but could not figure out to check bots 
> > >>>> feedback on an URL so I moved on
> > >>>>
> > >>>> I checked with Yandex. They have a search URL test page. I have 
> > >>>> entered there forum.openoffice.org
> > >>>>
> > >>>> The Response is:
> > >>>>
> > >>>> ------------------------------------------------------------------------
> > >>>>
> > >>>> * Date: Tue, 12 May 2020 10:37:47 GMT
> > >>>> * Server: Apache/2.4.18 (Ubuntu)
> > >>>> * Location: https://forum.openoffice.org/
> > >>>> * Content-Length: 237
> > >>>> * Keep-Alive: timeout=15, max=100
> > >>>> * Connection: Keep-Alive
> > >>>> * Content-Type: text/html; charset=iso-8859-1
> > >>>>
> > >>>> ------------------------------------------------------------------------
> > >>>>
> > >>>>
> > >>>> HTTP status code       301 Moved Permanently
> > >>>> Server response time   133 ms
> > >>>> IP address     54.84.201.130
> > >>>> Encoding       UTF-8(unicode-1-1-utf-8, UTF8)
> > >>>> Page size      237 B
> > >>>>
> > >>>>
> > >>>> I am not sure, what that means. HTTP Status Code moved Permanently 
> > >>>> reads wrong. I just dont know if this is the return code from our 
> > >>>> webservcer or a response code from the crawler.
> > >>>> I try to get someone from Infra. Or I'll open a ticket.
> > >>>>
> > >>>>
> > >>>> All the best
> > >>>> Peter
> > >>>>
> > >>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
> > >>>>> Hi Kay,
> > >>>>>
> > >>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
> > >>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
> > >>>>>>> Hi Kay,
> > >>>>>>>
> > >>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
> > >>>>>>>> Hi Peter...
> > >>>>>>>>
> > >>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
> > >>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
> > >>>>>>>> ANY work with the Google Search apis on these sites in quite some 
> > >>>>>>>> time.
> > >>>>>>>>
> > >>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use 
> > >>>>>>>> Google
> > >>>>>>>> Search until I saw this.
> > >>>>>>> I think, I added it to the list when we had a discussion about 
> > >>>>>>> outdated
> > >>>>>>> information regarding SourceForge found by Google Search.
> > >>>>>>>
> > >>>>>>> But I don't have access to forum.openoffice.org, so I could never
> > >>>>>>> complete the step.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>>
> > >>>>>>>    Matthias
> > >>>>>> OK. In the top level of the website source, there is a file called
> > >>>>>> "skeleton.html" which references the following bit of code --
> > >>>>>>
> > >>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
> > >>>>>>
> > >>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
> > >>>>>> forgot) but this this is example for the google-analytics code 
> > >>>>>> snippet
> > >>>>>> that is used. Basically, this needs to be included in the site you
> > >>>>>> want analytics to be used on by putting it in the (header) files that
> > >>>>>> generate the site. And, you might  take a look at recent instructions
> > >>>>>> from Google. Things change.
> > >>>>>>
> > >>>>>> https://support.google.com/analytics/answer/1008080
> > >>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" 
> > >>>>> the
> > >>>>> forum...
> > >>>>> The procedure for the Google Search Console is the same, it needs 
> > >>>>> access
> > >>>>> to the root directory.
> > >>>>>
> > >>>>> Maybe Andrea can help if he is available again?
> > >>>>>
> > >>>>> Regards,
> > >>>>>
> > >>>>>   Matthias
> > >>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Kay
> > >>>>>>
> > >>>>>>>> One of the Google Search admins for forum.openoffice.org could 
> > >>>>>>>> check
> > >>>>>>>> the current Google search apis that are in use on that site. 
> > >>>>>>>> Changes
> > >>>>>>>> are occasionally made to the calls, and maybe that is the issue, 
> > >>>>>>>> or a
> > >>>>>>>> robots.txt for that site is causing this. I don't think it 
> > >>>>>>>> requires a
> > >>>>>>>> response, but maybe some investigation.
> > >>>>>>>>
> > >>>>>>>> Just some ideas...
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>>
> > >>>>>>>> Kay
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
> > >>>>>>>>> Hi all,
> > >>>>>>>>>
> > >>>>>>>>> I have received following mail. Probably because I am listed in 
> > >>>>>>>>> the
> > >>>>>>>>> google-Analytics page.
> > >>>>>>>>>
> > >>>>>>>>> Does this has some action items? What can we answer Mr John 
> > >>>>>>>>> Mueller?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> All the Best
> > >>>>>>>>>
> > >>>>>>>>> Peter
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -------- Weitergeleitete Nachricht --------
> > >>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google 
> > >>>>>>>>> Search
> > >>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
> > >>>>>>>>> Von:     John Mueller <joh...@google.com>
> > >>>>>>>>> An:     morsei...@gmail.com, kay.sch...@gmail.com, 
> > >>>>>>>>> legi...@gmail.com
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Dear webmaster of forum.openoffice.org 
> > >>>>>>>>> <http://forum.openoffice.org>
> > >>>>>>>>>
> > >>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
> > >>>>>>>>> attention to a critical issue with your website, and how it's
> > >>>>>>>>> available for Google's web search.
> > >>>>>>>>>
> > >>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
> > >>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to 
> > >>>>>>>>> drop
> > >>>>>>>>> out of Google's search results, and will prevent new pages from 
> > >>>>>>>>> being
> > >>>>>>>>> picked up for Search. If you're not aware of this issue, you may 
> > >>>>>>>>> be
> > >>>>>>>>> accidentally blocking these pages from Google Search due to a 
> > >>>>>>>>> server
> > >>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
> > >>>>>>>>> website, we'd recommend using the robots.txt file instead.
> > >>>>>>>>>
> > >>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, 
> > >>>>>>>>> you
> > >>>>>>>>> can use a reverse IP lookup to do so:
> > >>>>>>>>> https://support.google.com/webmasters/answer/80553
> > >>>>>>>>>
> > >>>>>>>>> Should you have any questions, feel free to contact me directly. 
> > >>>>>>>>> For
> > >>>>>>>>> verification purposes, we are sending a copy of this message to 
> > >>>>>>>>> your
> > >>>>>>>>> site's Search Console account.
> > >>>>>>>>>
> > >>>>>>>>> Thank you,
> > >>>>>>>>> John Mueller (joh...@google.com <mailto:joh...@google.com>)
> > >>>>>>>>> Webmaster Trends Analyst
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>> ---------------------------------------------------------------------
> > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > >>>>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org
> > >>>>>>>>
> > >>>>>> ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > >>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org
> > >>>>>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > >>> For additional commands, e-mail: dev-h...@openoffice.apache.org
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > >> For additional commands, e-mail: dev-h...@openoffice.apache.org
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > > For additional commands, e-mail: dev-h...@openoffice.apache.org
> > >
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > For additional commands, e-mail: dev-h...@openoffice.apache.org
> > 
> 
> 
> -- 
> Rory O'Farrell <ofarr...@iol.ie>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org
> 


-- 
Rory O'Farrell <ofarr...@iol.ie>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to