On Mon, 18 May 2020 18:48:07 +0200 Peter Kovacs <pe...@apache.org> wrote:
> Im am already at it. It worked for me so far. I get search results.Maybe > it has to do with the cache. > > Not sure. We were testing on recent results; the figures I gave were for finding "openoffice" which would be used daily in many postings. Rory > > Am 18.05.20 um 18:22 schrieb Rory O'Farrell: > > On Mon, 18 May 2020 15:44:42 +0100 > > Rory O'Farrell <ofarr...@iol.ie> wrote: > > > >> On Tue, 12 May 2020 17:41:09 +0200 > >> Peter Kovacs <pe...@apache.org> wrote: > >> > >>> Okay, I had a short debug session with Dave and Humbedooh. > >>> > >>> We are now sure that the crawlers are not blocked. The 301 Response > >>> comes from the fact that Yandex still defaults to http and not https. > >> > >> This post on User Forum might be relevant > >> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756 > >> > >> Rory > > More detailed examination today shows that > > Google search in French seems to drop out six days ago, in Italian five > > days ago, and in English about 23rd April - try a search for openoffice and > > the site specifier > > > > See the above URL for details. > > > > Rory > > > > > >>> After I added https toi the URL all worked fine. > >>> > >>> Wave did also do a curl request which also worked fine. > >>> > >>> > >>> We have agreed now that I play the ball back to google, with the > >>> feedback that this looks like a Google internal issue. > >>> > >>> The Robot.txt has not been changed for 11 years. Yandex can crawl the > >>> URL and we can curl the Webpage. So we think it is an Google Issue. > >>> > >>> > >>> I very much appreciated the quick session. Thanks. > >>> > >>> > >>> all the Best > >>> > >>> Peter > >>> > >>> Am 12.05.20 um 17:24 schrieb Dave Fisher: > >>>> It’s not an IP Ban. Infra tells me that would not be a 301. > >>>> > >>>> Ah-ha - here is the 301: > >>>> > >>>> % curl -D headers http://forum.openoffice.org/ > >>>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> > >>>> <html><head> > >>>> <title>301 Moved Permanently</title> > >>>> </head><body> > >>>> <h1>Moved Permanently</h1> > >>>> <p>The document has moved <a > >>>> href="https://forum.openoffice.org/">here</a>.</p> > >>>> </body></html> > >>>> > >>>> Surprising that they cannot shift from HTTP to HTTPS via a 301! > >>>> > >>>> Regards, > >>>> Dave > >>>> > >>>>> On May 12, 2020, at 8:04 AM, Dave Fisher <w...@apache.org> wrote: > >>>>> > >>>>> Information about Infra IP Bans is here: > >>>>> https://infra.apache.org/infra-ban.html > >>>>> > >>>>> Please direct the Google engineer to that resource. > >>>>> > >>>>> Regards, > >>>>> Dave > >>>>> > >>>>>> On May 12, 2020, at 7:55 AM, Dave Fisher <w...@apache.org> wrote: > >>>>>> > >>>>>> Are you sure you weren’t using forums.openoffice.org instead of > >>>>>> forum.openoffice.org? > >>>>>> > >>>>>> curl -D headers https://forum.openoffice.org/ does return the correct > >>>>>> page. > >>>>>> > >>>>>> The robots.txt is this: > >>>>>> > >>>>>> curl -D headers https://forum.openoffice.org/robots.txt > >>>>>> User-agent: * > >>>>>> Crawl-delay: 1 > >>>>>> Disallow: /en/forum/common.php > >>>>>> Disallow: /en/forum/config.php > >>>>>> Disallow: /en/forum/con.php > >>>>>> Disallow: /en/forum/faq.php > >>>>>> Disallow: /en/forum/mcp.php > >>>>>> Disallow: /en/forum/memberlist.php > >>>>>> Disallow: /en/forum/posting.php > >>>>>> Disallow: /en/forum/report.php > >>>>>> Disallow: /en/forum/search.php > >>>>>> Disallow: /en/forum/style.php > >>>>>> Disallow: /en/forum/ucp.php > >>>>>> Disallow: /en/forum/viewonline.php > >>>>>> Disallow: /en/forum/adm > >>>>>> Disallow: /en/forum/cache > >>>>>> Disallow: /en/forum/docs > >>>>>> Disallow: /en/forum/files > >>>>>> Disallow: /en/forum/images > >>>>>> Disallow: /en/forum/includes > >>>>>> Disallow: /en/forum/language > >>>>>> Disallow: /en/forum/store > >>>>>> Disallow: /en/forum/styles > >>>>>> Disallow: /es/forum/common.php > >>>>>> Disallow: /es/forum/config.php > >>>>>> Disallow: /es/forum/con.php > >>>>>> Disallow: /es/forum/faq.php > >>>>>> Disallow: /es/forum/mcp.php > >>>>>> Disallow: /es/forum/memberlist.php > >>>>>> Disallow: /es/forum/posting.php > >>>>>> Disallow: /es/forum/report.php > >>>>>> Disallow: /es/forum/search.php > >>>>>> Disallow: /es/forum/style.php > >>>>>> Disallow: /es/forum/ucp.php > >>>>>> Disallow: /es/forum/viewonline.php > >>>>>> Disallow: /es/forum/adm > >>>>>> Disallow: /es/forum/cache > >>>>>> Disallow: /es/forum/docs > >>>>>> Disallow: /es/forum/files > >>>>>> Disallow: /es/forum/images > >>>>>> Disallow: /es/forum/includes > >>>>>> Disallow: /es/forum/language > >>>>>> Disallow: /es/forum/store > >>>>>> Disallow: /es/forum/styles > >>>>>> Disallow: /fr/forum/common.php > >>>>>> Disallow: /fr/forum/config.php > >>>>>> Disallow: /fr/forum/con.php > >>>>>> Disallow: /fr/forum/faq.php > >>>>>> Disallow: /fr/forum/mcp.php > >>>>>> Disallow: /fr/forum/memberlist.php > >>>>>> Disallow: /fr/forum/posting.php > >>>>>> Disallow: /fr/forum/report.php > >>>>>> Disallow: /fr/forum/search.php > >>>>>> Disallow: /fr/forum/style.php > >>>>>> Disallow: /fr/forum/ucp.php > >>>>>> Disallow: /fr/forum/viewonline.php > >>>>>> Disallow: /fr/forum/adm > >>>>>> Disallow: /fr/forum/cache > >>>>>> Disallow: /fr/forum/docs > >>>>>> Disallow: /fr/forum/files > >>>>>> Disallow: /fr/forum/images > >>>>>> Disallow: /fr/forum/includes > >>>>>> Disallow: /fr/forum/language > >>>>>> Disallow: /fr/forum/store > >>>>>> Disallow: /fr/forum/styles > >>>>>> Disallow: /fr/ci-joint > >>>>>> Disallow: /hu/forum/common.php > >>>>>> Disallow: /hu/forum/config.php > >>>>>> Disallow: /hu/forum/con.php > >>>>>> Disallow: /hu/forum/faq.php > >>>>>> Disallow: /hu/forum/mcp.php > >>>>>> Disallow: /hu/forum/memberlist.php > >>>>>> Disallow: /hu/forum/posting.php > >>>>>> Disallow: /hu/forum/report.php > >>>>>> Disallow: /hu/forum/search.php > >>>>>> Disallow: /hu/forum/style.php > >>>>>> Disallow: /hu/forum/ucp.php > >>>>>> Disallow: /hu/forum/viewonline.php > >>>>>> Disallow: /hu/forum/adm > >>>>>> Disallow: /hu/forum/cache > >>>>>> Disallow: /hu/forum/docs > >>>>>> Disallow: /hu/forum/files > >>>>>> Disallow: /hu/forum/images > >>>>>> Disallow: /hu/forum/includes > >>>>>> Disallow: /hu/forum/language > >>>>>> Disallow: /hu/forum/store > >>>>>> Disallow: /hu/forum/styles > >>>>>> Disallow: /ja/forum/common.php > >>>>>> Disallow: /ja/forum/config.php > >>>>>> Disallow: /ja/forum/con.php > >>>>>> Disallow: /ja/forum/faq.php > >>>>>> Disallow: /ja/forum/mcp.php > >>>>>> Disallow: /ja/forum/memberlist.php > >>>>>> Disallow: /ja/forum/posting.php > >>>>>> Disallow: /ja/forum/report.php > >>>>>> Disallow: /ja/forum/search.php > >>>>>> Disallow: /ja/forum/style.php > >>>>>> Disallow: /ja/forum/ucp.php > >>>>>> Disallow: /ja/forum/viewonline.php > >>>>>> Disallow: /ja/forum/adm > >>>>>> Disallow: /ja/forum/cache > >>>>>> Disallow: /ja/forum/docs > >>>>>> Disallow: /ja/forum/files > >>>>>> Disallow: /ja/forum/images > >>>>>> Disallow: /ja/forum/includes > >>>>>> Disallow: /ja/forum/language > >>>>>> Disallow: /ja/forum/store > >>>>>> Disallow: /ja/forum/styles > >>>>>> Disallow: /test > >>>>>> Disallow: /nl/forum/common.php > >>>>>> Disallow: /nl/forum/config.php > >>>>>> Disallow: /nl/forum/con.php > >>>>>> Disallow: /nl/forum/faq.php > >>>>>> Disallow: /nl/forum/mcp.php > >>>>>> Disallow: /nl/forum/memberlist.php > >>>>>> Disallow: /nl/forum/posting.php > >>>>>> Disallow: /nl/forum/report.php > >>>>>> Disallow: /nl/forum/search.php > >>>>>> Disallow: /nl/forum/style.php > >>>>>> Disallow: /nl/forum/ucp.php > >>>>>> Disallow: /nl/forum/viewonline.php > >>>>>> Disallow: /nl/forum/adm > >>>>>> Disallow: /nl/forum/cache > >>>>>> Disallow: /nl/forum/docs > >>>>>> Disallow: /nl/forum/files > >>>>>> Disallow: /nl/forum/images > >>>>>> Disallow: /nl/forum/includes > >>>>>> Disallow: /nl/forum/language > >>>>>> Disallow: /nl/forum/store > >>>>>> Disallow: /nl/forum/styles > >>>>>> Disallow: /vi/forum/common.php > >>>>>> Disallow: /vi/forum/config.php > >>>>>> Disallow: /vi/forum/con.php > >>>>>> Disallow: /vi/forum/faq.php > >>>>>> Disallow: /vi/forum/mcp.php > >>>>>> Disallow: /vi/forum/memberlist.php > >>>>>> Disallow: /vi/forum/posting.php > >>>>>> Disallow: /vi/forum/report.php > >>>>>> Disallow: /vi/forum/search.php > >>>>>> Disallow: /vi/forum/style.php > >>>>>> Disallow: /vi/forum/ucp.php > >>>>>> Disallow: /vi/forum/viewonline.php > >>>>>> Disallow: /vi/forum/adm > >>>>>> Disallow: /vi/forum/cache > >>>>>> Disallow: /vi/forum/docs > >>>>>> Disallow: /vi/forum/files > >>>>>> Disallow: /vi/forum/images > >>>>>> Disallow: /vi/forum/includes > >>>>>> Disallow: /vi/forum/language > >>>>>> Disallow: /vi/forum/store > >>>>>> Disallow: /vi/forum/styles > >>>>>> Disallow: /zh/forum/common.php > >>>>>> Disallow: /zh/forum/config.php > >>>>>> Disallow: /zh/forum/con.php > >>>>>> Disallow: /zh/forum/faq.php > >>>>>> Disallow: /zh/forum/mcp.php > >>>>>> Disallow: /zh/forum/memberlist.php > >>>>>> Disallow: /zh/forum/posting.php > >>>>>> Disallow: /zh/forum/report.php > >>>>>> Disallow: /zh/forum/search.php > >>>>>> Disallow: /zh/forum/style.php > >>>>>> Disallow: /zh/forum/ucp.php > >>>>>> Disallow: /zh/forum/viewonline.php > >>>>>> Disallow: /zh/forum/adm > >>>>>> Disallow: /zh/forum/cache > >>>>>> Disallow: /zh/forum/docs > >>>>>> Disallow: /zh/forum/files > >>>>>> Disallow: /zh/forum/images > >>>>>> Disallow: /zh/forum/includes > >>>>>> Disallow: /zh/forum/language > >>>>>> Disallow: /zh/forum/store > >>>>>> Disallow: /zh/forum/styles > >>>>>> > >>>>>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun > >>>>>> 2009 23:40:14 GMT > >>>>>> > >>>>>> Forum search uses phpBB > >>>>>> > >>>>>> We haven’t allowed search engines to crawl forum.openoffice.org since > >>>>>> before the Oracle donation to the ASF. > >>>>>> > >>>>>> Crawlers IP addresses might be blocked by ASF Infra if their use is > >>>>>> excessive. That could give the 301. > >>>>>> > >>>>>> Regards, > >>>>>> Dave > >>>>>> > >>>>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <leg...@posteo.de> wrote: > >>>>>>> > >>>>>>> Hello all, > >>>>>>> > >>>>>>> > >>>>>>> What I figured is that from the Google search tool the URL > >>>>>>> forum.openoffice.org is not reachable. > >>>>>>> > >>>>>>> So I checked with Duckduckgo (my prefered Search engine), they don't > >>>>>>> use crawler and point at the infra of Google, Bing and Yandex. > >>>>>>> > >>>>>>> I checked then with Bing, but could not figure out to check bots > >>>>>>> feedback on an URL so I moved on > >>>>>>> > >>>>>>> I checked with Yandex. They have a search URL test page. I have > >>>>>>> entered there forum.openoffice.org > >>>>>>> > >>>>>>> The Response is: > >>>>>>> > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> > >>>>>>> * Date: Tue, 12 May 2020 10:37:47 GMT > >>>>>>> * Server: Apache/2.4.18 (Ubuntu) > >>>>>>> * Location: https://forum.openoffice.org/ > >>>>>>> * Content-Length: 237 > >>>>>>> * Keep-Alive: timeout=15, max=100 > >>>>>>> * Connection: Keep-Alive > >>>>>>> * Content-Type: text/html; charset=iso-8859-1 > >>>>>>> > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> > >>>>>>> > >>>>>>> HTTP status code 301 Moved Permanently > >>>>>>> Server response time 133 ms > >>>>>>> IP address 54.84.201.130 > >>>>>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8) > >>>>>>> Page size 237 B > >>>>>>> > >>>>>>> > >>>>>>> I am not sure, what that means. HTTP Status Code moved Permanently > >>>>>>> reads wrong. I just dont know if this is the return code from our > >>>>>>> webservcer or a response code from the crawler. > >>>>>>> I try to get someone from Infra. Or I'll open a ticket. > >>>>>>> > >>>>>>> > >>>>>>> All the best > >>>>>>> Peter > >>>>>>> > >>>>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel: > >>>>>>>> Hi Kay, > >>>>>>>> > >>>>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk: > >>>>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote: > >>>>>>>>>> Hi Kay, > >>>>>>>>>> > >>>>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk: > >>>>>>>>>>> Hi Peter... > >>>>>>>>>>> > >>>>>>>>>>> Since I am a Google Search admin for www.openoffice.org, and > >>>>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not > >>>>>>>>>>> done > >>>>>>>>>>> ANY work with the Google Search apis on these sites in quite some > >>>>>>>>>>> time. > >>>>>>>>>>> > >>>>>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use > >>>>>>>>>>> Google > >>>>>>>>>>> Search until I saw this. > >>>>>>>>>> I think, I added it to the list when we had a discussion about > >>>>>>>>>> outdated > >>>>>>>>>> information regarding SourceForge found by Google Search. > >>>>>>>>>> > >>>>>>>>>> But I don't have access to forum.openoffice.org, so I could never > >>>>>>>>>> complete the step. > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> > >>>>>>>>>> Matthias > >>>>>>>>> OK. In the top level of the website source, there is a file called > >>>>>>>>> "skeleton.html" which references the following bit of code -- > >>>>>>>>> > >>>>>>>>> <!--#include virtual="/scripts/google-analytics.js" --> > >>>>>>>>> > >>>>>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I > >>>>>>>>> forgot) but this this is example for the google-analytics code > >>>>>>>>> snippet > >>>>>>>>> that is used. Basically, this needs to be included in the site you > >>>>>>>>> want analytics to be used on by putting it in the (header) files > >>>>>>>>> that > >>>>>>>>> generate the site. And, you might take a look at recent > >>>>>>>>> instructions > >>>>>>>>> from Google. Things change. > >>>>>>>>> > >>>>>>>>> https://support.google.com/analytics/answer/1008080 > >>>>>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" > >>>>>>>> the > >>>>>>>> forum... > >>>>>>>> The procedure for the Google Search Console is the same, it needs > >>>>>>>> access > >>>>>>>> to the root directory. > >>>>>>>> > >>>>>>>> Maybe Andrea can help if he is available again? > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> > >>>>>>>> Matthias > >>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> > >>>>>>>>> Kay > >>>>>>>>> > >>>>>>>>>>> One of the Google Search admins for forum.openoffice.org could > >>>>>>>>>>> check > >>>>>>>>>>> the current Google search apis that are in use on that site. > >>>>>>>>>>> Changes > >>>>>>>>>>> are occasionally made to the calls, and maybe that is the issue, > >>>>>>>>>>> or a > >>>>>>>>>>> robots.txt for that site is causing this. I don't think it > >>>>>>>>>>> requires a > >>>>>>>>>>> response, but maybe some investigation. > >>>>>>>>>>> > >>>>>>>>>>> Just some ideas... > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> > >>>>>>>>>>> Kay > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote: > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> I have received following mail. Probably because I am listed in > >>>>>>>>>>>> the > >>>>>>>>>>>> google-Analytics page. > >>>>>>>>>>>> > >>>>>>>>>>>> Does this has some action items? What can we answer Mr John > >>>>>>>>>>>> Mueller? > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> All the Best > >>>>>>>>>>>> > >>>>>>>>>>>> Peter > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -------- Weitergeleitete Nachricht -------- > >>>>>>>>>>>> Betreff: Critical issue on forum.openoffice.org and Google > >>>>>>>>>>>> Search > >>>>>>>>>>>> Datum: Mon, 11 May 2020 13:37:27 +0200 > >>>>>>>>>>>> Von: John Mueller <joh...@google.com> > >>>>>>>>>>>> An: morsei...@gmail.com, kay.sch...@gmail.com, > >>>>>>>>>>>> legi...@gmail.com > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Dear webmaster of forum.openoffice.org > >>>>>>>>>>>> <http://forum.openoffice.org> > >>>>>>>>>>>> > >>>>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your > >>>>>>>>>>>> attention to a critical issue with your website, and how it's > >>>>>>>>>>>> available for Google's web search. > >>>>>>>>>>>> > >>>>>>>>>>>> In particular, Googlebot has been unable to crawl URLs from > >>>>>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to > >>>>>>>>>>>> drop > >>>>>>>>>>>> out of Google's search results, and will prevent new pages from > >>>>>>>>>>>> being > >>>>>>>>>>>> picked up for Search. If you're not aware of this issue, you may > >>>>>>>>>>>> be > >>>>>>>>>>>> accidentally blocking these pages from Google Search due to a > >>>>>>>>>>>> server > >>>>>>>>>>>> issue. If you need to block Googlebot from crawling pages on your > >>>>>>>>>>>> website, we'd recommend using the robots.txt file instead. > >>>>>>>>>>>> > >>>>>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, > >>>>>>>>>>>> you > >>>>>>>>>>>> can use a reverse IP lookup to do so: > >>>>>>>>>>>> https://support.google.com/webmasters/answer/80553 > >>>>>>>>>>>> > >>>>>>>>>>>> Should you have any questions, feel free to contact me directly. > >>>>>>>>>>>> For > >>>>>>>>>>>> verification purposes, we are sending a copy of this message to > >>>>>>>>>>>> your > >>>>>>>>>>>> site's Search Console account. > >>>>>>>>>>>> > >>>>>>>>>>>> Thank you, > >>>>>>>>>>>> John Mueller (joh...@google.com <mailto:joh...@google.com>) > >>>>>>>>>>>> Webmaster Trends Analyst > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> --------------------------------------------------------------------- > >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>>>>>>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>>>>>>>>>> > >>>>>>>>> --------------------------------------------------------------------- > >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>>>>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>>>>>>>> > >>>>>> --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >>> For additional commands, e-mail: dev-h...@openoffice.apache.org > >>> > >> > >> -- > >> Rory O'Farrell <ofarr...@iol.ie> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > >> For additional commands, e-mail: dev-h...@openoffice.apache.org > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org > For additional commands, e-mail: dev-h...@openoffice.apache.org > -- Rory O'Farrell <ofarr...@iol.ie> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org For additional commands, e-mail: dev-h...@openoffice.apache.org