On Mon, 18 May 2020 15:44:42 +0100 Rory O'Farrell <ofarr...@iol.ie> wrote:
> On Tue, 12 May 2020 17:41:09 +0200 > Peter Kovacs <pe...@apache.org> wrote: > > > Okay, I had a short debug session with Dave and Humbedooh. > > > > We are now sure that the crawlers are not blocked. The 301 Response > > comes from the fact that Yandex still defaults to http and not https. > > > This post on User Forum might be relevant > https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756 > > Rory More detailed examination today shows that Google search in French seems to drop out six days ago, in Italian five days ago, and in English about 23rd April - try a search for openoffice and the site specifier See the above URL for details. Rory > > > > After I added https toi the URL all worked fine. > > > > Wave did also do a curl request which also worked fine. > > > > > > We have agreed now that I play the ball back to google, with the > > feedback that this looks like a Google internal issue. > > > > The Robot.txt has not been changed for 11 years. Yandex can crawl the > > URL and we can curl the Webpage. So we think it is an Google Issue. > > > > > > I very much appreciated the quick session. Thanks. > > > > > > all the Best > > > > Peter > > > > Am 12.05.20 um 17:24 schrieb Dave Fisher: > > > It’s not an IP Ban. Infra tells me that would not be a 301. > > > > > > Ah-ha - here is the 301: > > > > > > % curl -D headers http://forum.openoffice.org/ > > > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > > > <html><head> > > > <title>301 Moved Permanently</title> > > > </head><body> > > > <h1>Moved Permanently</h1> > > > <p>The document has moved <a > > > href="https://forum.openoffice.org/">here</a>.</p> > > > </body></html> > > > > > > Surprising that they cannot shift from HTTP to HTTPS via a 301! > > > > > > Regards, > > > Dave > > > > > >> On May 12, 2020, at 8:04 AM, Dave Fisher <w...@apache.org> wrote: > > >> > > >> Information about Infra IP Bans is here: > > >> https://infra.apache.org/infra-ban.html > > >> > > >> Please direct the Google engineer to that resource. > > >> > > >> Regards, > > >> Dave > > >> > > >>> On May 12, 2020, at 7:55 AM, Dave Fisher <w...@apache.org> wrote: > > >>> > > >>> Are you sure you weren’t using forums.openoffice.org instead of > > >>> forum.openoffice.org? > > >>> > > >>> curl -D headers https://forum.openoffice.org/ does return the correct > > >>> page. > > >>> > > >>> The robots.txt is this: > > >>> > > >>> curl -D headers https://forum.openoffice.org/robots.txt > > >>> User-agent: * > > >>> Crawl-delay: 1 > > >>> Disallow: /en/forum/common.php > > >>> Disallow: /en/forum/config.php > > >>> Disallow: /en/forum/con.php > > >>> Disallow: /en/forum/faq.php > > >>> Disallow: /en/forum/mcp.php > > >>> Disallow: /en/forum/memberlist.php > > >>> Disallow: /en/forum/posting.php > > >>> Disallow: /en/forum/report.php > > >>> Disallow: /en/forum/search.php > > >>> Disallow: /en/forum/style.php > > >>> Disallow: /en/forum/ucp.php > > >>> Disallow: /en/forum/viewonline.php > > >>> Disallow: /en/forum/adm > > >>> Disallow: /en/forum/cache > > >>> Disallow: /en/forum/docs > > >>> Disallow: /en/forum/files > > >>> Disallow: /en/forum/images > > >>> Disallow: /en/forum/includes > > >>> Disallow: /en/forum/language > > >>> Disallow: /en/forum/store > > >>> Disallow: /en/forum/styles > > >>> Disallow: /es/forum/common.php > > >>> Disallow: /es/forum/config.php > > >>> Disallow: /es/forum/con.php > > >>> Disallow: /es/forum/faq.php > > >>> Disallow: /es/forum/mcp.php > > >>> Disallow: /es/forum/memberlist.php > > >>> Disallow: /es/forum/posting.php > > >>> Disallow: /es/forum/report.php > > >>> Disallow: /es/forum/search.php > > >>> Disallow: /es/forum/style.php > > >>> Disallow: /es/forum/ucp.php > > >>> Disallow: /es/forum/viewonline.php > > >>> Disallow: /es/forum/adm > > >>> Disallow: /es/forum/cache > > >>> Disallow: /es/forum/docs > > >>> Disallow: /es/forum/files > > >>> Disallow: /es/forum/images > > >>> Disallow: /es/forum/includes > > >>> Disallow: /es/forum/language > > >>> Disallow: /es/forum/store > > >>> Disallow: /es/forum/styles > > >>> Disallow: /fr/forum/common.php > > >>> Disallow: /fr/forum/config.php > > >>> Disallow: /fr/forum/con.php > > >>> Disallow: /fr/forum/faq.php > > >>> Disallow: /fr/forum/mcp.php > > >>> Disallow: /fr/forum/memberlist.php > > >>> Disallow: /fr/forum/posting.php > > >>> Disallow: /fr/forum/report.php > > >>> Disallow: /fr/forum/search.php > > >>> Disallow: /fr/forum/style.php > > >>> Disallow: /fr/forum/ucp.php > > >>> Disallow: /fr/forum/viewonline.php > > >>> Disallow: /fr/forum/adm > > >>> Disallow: /fr/forum/cache > > >>> Disallow: /fr/forum/docs > > >>> Disallow: /fr/forum/files > > >>> Disallow: /fr/forum/images > > >>> Disallow: /fr/forum/includes > > >>> Disallow: /fr/forum/language > > >>> Disallow: /fr/forum/store > > >>> Disallow: /fr/forum/styles > > >>> Disallow: /fr/ci-joint > > >>> Disallow: /hu/forum/common.php > > >>> Disallow: /hu/forum/config.php > > >>> Disallow: /hu/forum/con.php > > >>> Disallow: /hu/forum/faq.php > > >>> Disallow: /hu/forum/mcp.php > > >>> Disallow: /hu/forum/memberlist.php > > >>> Disallow: /hu/forum/posting.php > > >>> Disallow: /hu/forum/report.php > > >>> Disallow: /hu/forum/search.php > > >>> Disallow: /hu/forum/style.php > > >>> Disallow: /hu/forum/ucp.php > > >>> Disallow: /hu/forum/viewonline.php > > >>> Disallow: /hu/forum/adm > > >>> Disallow: /hu/forum/cache > > >>> Disallow: /hu/forum/docs > > >>> Disallow: /hu/forum/files > > >>> Disallow: /hu/forum/images > > >>> Disallow: /hu/forum/includes > > >>> Disallow: /hu/forum/language > > >>> Disallow: /hu/forum/store > > >>> Disallow: /hu/forum/styles > > >>> Disallow: /ja/forum/common.php > > >>> Disallow: /ja/forum/config.php > > >>> Disallow: /ja/forum/con.php > > >>> Disallow: /ja/forum/faq.php > > >>> Disallow: /ja/forum/mcp.php > > >>> Disallow: /ja/forum/memberlist.php > > >>> Disallow: /ja/forum/posting.php > > >>> Disallow: /ja/forum/report.php > > >>> Disallow: /ja/forum/search.php > > >>> Disallow: /ja/forum/style.php > > >>> Disallow: /ja/forum/ucp.php > > >>> Disallow: /ja/forum/viewonline.php > > >>> Disallow: /ja/forum/adm > > >>> Disallow: /ja/forum/cache > > >>> Disallow: /ja/forum/docs > > >>> Disallow: /ja/forum/files > > >>> Disallow: /ja/forum/images > > >>> Disallow: /ja/forum/includes > > >>> Disallow: /ja/forum/language > > >>> Disallow: /ja/forum/store > > >>> Disallow: /ja/forum/styles > > >>> Disallow: /test > > >>> Disallow: /nl/forum/common.php > > >>> Disallow: /nl/forum/config.php > > >>> Disallow: /nl/forum/con.php > > >>> Disallow: /nl/forum/faq.php > > >>> Disallow: /nl/forum/mcp.php > > >>> Disallow: /nl/forum/memberlist.php > > >>> Disallow: /nl/forum/posting.php > > >>> Disallow: /nl/forum/report.php > > >>> Disallow: /nl/forum/search.php > > >>> Disallow: /nl/forum/style.php > > >>> Disallow: /nl/forum/ucp.php > > >>> Disallow: /nl/forum/viewonline.php > > >>> Disallow: /nl/forum/adm > > >>> Disallow: /nl/forum/cache > > >>> Disallow: /nl/forum/docs > > >>> Disallow: /nl/forum/files > > >>> Disallow: /nl/forum/images > > >>> Disallow: /nl/forum/includes > > >>> Disallow: /nl/forum/language > > >>> Disallow: /nl/forum/store > > >>> Disallow: /nl/forum/styles > > >>> Disallow: /vi/forum/common.php > > >>> Disallow: /vi/forum/config.php > > >>> Disallow: /vi/forum/con.php > > >>> Disallow: /vi/forum/faq.php > > >>> Disallow: /vi/forum/mcp.php > > >>> Disallow: /vi/forum/memberlist.php > > >>> Disallow: /vi/forum/posting.php > > >>> Disallow: /vi/forum/report.php > > >>> Disallow: /vi/forum/search.php > > >>> Disallow: /vi/forum/style.php > > >>> Disallow: /vi/forum/ucp.php > > >>> Disallow: /vi/forum/viewonline.php > > >>> Disallow: /vi/forum/adm > > >>> Disallow: /vi/forum/cache > > >>> Disallow: /vi/forum/docs > > >>> Disallow: /vi/forum/files > > >>> Disallow: /vi/forum/images > > >>> Disallow: /vi/forum/includes > > >>> Disallow: /vi/forum/language > > >>> Disallow: /vi/forum/store > > >>> Disallow: /vi/forum/styles > > >>> Disallow: /zh/forum/common.php > > >>> Disallow: /zh/forum/config.php > > >>> Disallow: /zh/forum/con.php > > >>> Disallow: /zh/forum/faq.php > > >>> Disallow: /zh/forum/mcp.php > > >>> Disallow: /zh/forum/memberlist.php > > >>> Disallow: /zh/forum/posting.php > > >>> Disallow: /zh/forum/report.php > > >>> Disallow: /zh/forum/search.php > > >>> Disallow: /zh/forum/style.php > > >>> Disallow: /zh/forum/ucp.php > > >>> Disallow: /zh/forum/viewonline.php > > >>> Disallow: /zh/forum/adm > > >>> Disallow: /zh/forum/cache > > >>> Disallow: /zh/forum/docs > > >>> Disallow: /zh/forum/files > > >>> Disallow: /zh/forum/images > > >>> Disallow: /zh/forum/includes > > >>> Disallow: /zh/forum/language > > >>> Disallow: /zh/forum/store > > >>> Disallow: /zh/forum/styles > > >>> > > >>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun > > >>> 2009 23:40:14 GMT > > >>> > > >>> Forum search uses phpBB > > >>> > > >>> We haven’t allowed search engines to crawl forum.openoffice.org since > > >>> before the Oracle donation to the ASF. > > >>> > > >>> Crawlers IP addresses might be blocked by ASF Infra if their use is > > >>> excessive. That could give the 301. > > >>> > > >>> Regards, > > >>> Dave > > >>> > > >>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <leg...@posteo.de> wrote: > > >>>> > > >>>> Hello all, > > >>>> > > >>>> > > >>>> What I figured is that from the Google search tool the URL > > >>>> forum.openoffice.org is not reachable. > > >>>> > > >>>> So I checked with Duckduckgo (my prefered Search engine), they don't > > >>>> use crawler and point at the infra of Google, Bing and Yandex. > > >>>> > > >>>> I checked then with Bing, but could not figure out to check bots > > >>>> feedback on an URL so I moved on > > >>>> > > >>>> I checked with Yandex. They have a search URL test page. I have > > >>>> entered there forum.openoffice.org > > >>>> > > >>>> The Response is: > > >>>> > > >>>> ------------------------------------------------------------------------ > > >>>> > > >>>> * Date: Tue, 12 May 2020 10:37:47 GMT > > >>>> * Server: Apache/2.4.18 (Ubuntu) > > >>>> * Location: https://forum.openoffice.org/ > > >>>> * Content-Length: 237 > > >>>> * Keep-Alive: timeout=15, max=100 > > >>>> * Connection: Keep-Alive > > >>>> * Content-Type: text/html; charset=iso-8859-1 > > >>>> > > >>>> ------------------------------------------------------------------------ > > >>>> > > >>>> > > >>>> HTTP status code 301 Moved Permanently > > >>>> Server response time 133 ms > > >>>> IP address 54.84.201.130 > > >>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8) > > >>>> Page size 237 B > > >>>> > > >>>> > > >>>> I am not sure, what that means. HTTP Status Code moved Permanently > > >>>> reads wrong. I just dont know if this is the return code from our > > >>>> webservcer or a response code from the crawler. > > >>>> I try to get someone from Infra. Or I'll open a ticket. > > >>>> > > >>>> > > >>>> All the best > > >>>> Peter > > >>>> > > >>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel: > > >>>>> Hi Kay, > > >>>>> > > >>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk: > > >>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote: > > >>>>>>> Hi Kay, > > >>>>>>> > > >>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk: > > >>>>>>>> Hi Peter... > > >>>>>>>> > > >>>>>>>> Since I am a Google Search admin for www.openoffice.org, and > > >>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done > > >>>>>>>> ANY work with the Google Search apis on these sites in quite some > > >>>>>>>> time. > > >>>>>>>> > > >>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use > > >>>>>>>> Google > > >>>>>>>> Search until I saw this. > > >>>>>>> I think, I added it to the list when we had a discussion about > > >>>>>>> outdated > > >>>>>>> information regarding SourceForge found by Google Search. > > >>>>>>> > > >>>>>>> But I don't have access to forum.openoffice.org, so I could never > > >>>>>>> complete the step. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> > > >>>>>>> Matthias > > >>>>>> OK. In the top level of the website source, there is a file called > > >>>>>> "skeleton.html" which references the following bit of code -- > > >>>>>> > > >>>>>> <!--#include virtual="/scripts/google-analytics.js" --> > > >>>>>> > > >>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I > > >>>>>> forgot) but this this is example for the google-analytics code > > >>>>>> snippet > > >>>>>> that is used. Basically, this needs to be included in the site you > > >>>>>> want analytics to be used on by putting it in the (header) files that > > >>>>>> generate the site. And, you might take a look at recent instructions > > >>>>>> from Google. Things change. > > >>>>>> > > >>>>>> https://support.google.com/analytics/answer/1008080 > > >>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" > > >>>>> the > > >>>>> forum... > > >>>>> The procedure for the Google Search Console is the same, it needs > > >>>>> access > > >>>>> to the root directory. > > >>>>> > > >>>>> Maybe Andrea can help if he is available again? > > >>>>> > > >>>>> Regards, > > >>>>> > > >>>>> Matthias > > >>>>> > > >>>>>> Regards, > > >>>>>> > > >>>>>> Kay > > >>>>>> > > >>>>>>>> One of the Google Search admins for forum.openoffice.org could > > >>>>>>>> check > > >>>>>>>> the current Google search apis that are in use on that site. > > >>>>>>>> Changes > > >>>>>>>> are occasionally made to the calls, and maybe that is the issue, > > >>>>>>>> or a > > >>>>>>>> robots.txt for that site is causing this. I don't think it > > >>>>>>>> requires a > > >>>>>>>> response, but maybe some investigation. > > >>>>>>>> > > >>>>>>>> Just some ideas... > > >>>>>>>> > > >>>>>>>> Regards, > > >>>>>>>> > > >>>>>>>> Kay > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote: > > >>>>>>>>> Hi all, > > >>>>>>>>> > > >>>>>>>>> I have received following mail. Probably because I am listed in > > >>>>>>>>> the > > >>>>>>>>> google-Analytics page. > > >>>>>>>>> > > >>>>>>>>> Does this has some action items? What can we answer Mr John > > >>>>>>>>> Mueller? > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> All the Best > > >>>>>>>>> > > >>>>>>>>> Peter > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -------- Weitergeleitete Nachricht -------- > > >>>>>>>>> Betreff: Critical issue on forum.openoffice.org and Google > > >>>>>>>>> Search > > >>>>>>>>> Datum: Mon, 11 May 2020 13:37:27 +0200 > > >>>>>>>>> Von: John Mueller <joh...@google.com> > > >>>>>>>>> An: morsei...@gmail.com, kay.sch...@gmail.com, > > >>>>>>>>> legi...@gmail.com > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Dear webmaster of forum.openoffice.org > > >>>>>>>>> <http://forum.openoffice.org> > > >>>>>>>>> > > >>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your > > >>>>>>>>> attention to a critical issue with your website, and how it's > > >>>>>>>>> available for Google's web search. > > >>>>>>>>> > > >>>>>>>>> In particular, Googlebot has been unable to crawl URLs from > > >>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to > > >>>>>>>>> drop > > >>>>>>>>> out of Google's search results, and will prevent new pages from > > >>>>>>>>> being > > >>>>>>>>> picked up for Search. If you're not aware of this issue, you may > > >>>>>>>>> be > > >>>>>>>>> accidentally blocking these pages from Google Search due to a > > >>>>>>>>> server > > >>>>>>>>> issue. If you need to block Googlebot from crawling pages on your > > >>>>>>>>> website, we'd recommend using the robots.txt file instead. > > >>>>>>>>> > > >>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, > > >>>>>>>>> you > > >>>>>>>>> can use a reverse IP lookup to do so: > > >>>>>>>>> https://support.google.com/webmasters/answer/80553 > > >>>>>>>>> > > >>>>>>>>> Should you have any questions, feel free to contact me directly. > > >>>>>>>>> For > > >>>>>>>>> verification purposes, we are sending a copy of this message to > > >>>>>>>>> your > > >>>>>>>>> site's Search Console account. > > >>>>>>>>> > > >>>>>>>>> Thank you, > > >>>>>>>>> John Mueller (joh...@google.com <mailto:joh...@google.com>) > > >>>>>>>>> Webmaster Trends Analyst > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> --------------------------------------------------------------------- > > >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > >>>>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > > >>>>>>>> > > >>>>>> --------------------------------------------------------------------- > > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > >>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > > >>>>>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > >>> For additional commands, e-mail: dev-h...@openoffice.apache.org > > >>> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > >> For additional commands, e-mail: dev-h...@openoffice.apache.org > > >> > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > > For additional commands, e-mail: dev-h...@openoffice.apache.org > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > > For additional commands, e-mail: dev-h...@openoffice.apache.org > > > > > -- > Rory O'Farrell <ofarr...@iol.ie> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > For additional commands, e-mail: dev-h...@openoffice.apache.org > -- Rory O'Farrell <ofarr...@iol.ie> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org