Hi Erlend, please can you also add the manifoldcf log as well?
Thanks, Karl Sent from my Windows Phone ------------------------------ From: Karl Wright Sent: 9/18/2014 6:24 AM To: dev Subject: Re: [VOTE] Release Apache ManifoldCF 1.7.1, RC1 Hi Erlend, MCF does not care if there's garbage in the robots file; it just warns when it sees it. That doesn't appear to be the source of the difficulty. Karl On Thu, Sep 18, 2014 at 6:20 AM, Erlend Garåsen <e.f.gara...@usit.uio.no> wrote: > > MCF should handle invalid robots.txt files. We cannot rely on what people > have entered into such files. So I guess MCF should just ignore invalid > robots.txt files. I guess it already does. > > It seems invalid due to use of the = symbol instead of a #. I'm not an > expert of such files, so I'm not completely sure. > > E > > > On 18.09.14 12:04, Karl Wright wrote: > >> Hi Erlend, >> >> Your robots file has this at the top: >> >> ==== >> The contents of this file are subject to the license and copyright >> detailed in the LICENSE and NOTICE files at the root of the source >> tree and available online at >> >> http://www.dspace.org/license/ >> ==== >> >> That's fine except to the best of my knowledge the robots spec does >> not allow for comments at all. >> >> If you have reason to believe that has changed, then please point me >> at a reference and I can change our robots parser. >> >> Thanks, >> Karl >> >> >> >> On Thu, Sep 18, 2014 at 6:02 AM, Karl Wright <daddy...@gmail.com> wrote: >> >> Hi Erlend, >>> >>> MCF caches the robots.txt file in the database, which it considers valid >>> for 1 hour. >>> >>> I'll look at the logs and thread dump and let you know if this is a >>> locking issue or something else. Please stand by. >>> >>> Karl >>> >>> >>> On Thu, Sep 18, 2014 at 5:24 AM, Erlend Garåsen <e.f.gara...@usit.uio.no >>> > >>> wrote: >>> >>> >>>> I tried to restart the job dealing with www.duo.no on our test server, >>>> but it does not seem to touch the robots.txt file at all. That's the >>>> reason >>>> why it's able to continue. Both servers are set up to obey the rules of >>>> such files. >>>> >>>> Erlend >>>> >>>> >>>> On 18.09.14 11:12, Erlend Garåsen wrote: >>>> >>>> >>>>> I'm facing the same problems with robot.txt files using RC1, so maybe >>>>> this is another issue we have to fix. Can you please try to fetch the >>>>> host below? For some odd reason, it seems that MCF on our test server >>>>> can handle it. >>>>> >>>>> This is exactly the same that happened when I started MCF (referring to >>>>> my previous post) after I had deployed RC1: >>>>> 09-18-2014 11:02:14.400 robots parse https:www.duo.uio.no:443 >>>>> ERRORS 0 3 Unknown robots.txt line: '====' >>>>> >>>>> No activity after this error. >>>>> >>>>> Here's the robots.txt file: >>>>> https://www.duo.uio.no/robots.txt >>>>> >>>>> This is the content of manifoldcf.log after the startup: >>>>> WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown >>>>> robots.txt line from 'https:www.duo.uio.no:443': '====' >>>>> WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown >>>>> robots.txt line from 'https:www.duo.uio.no:443': ' The contents of >>>>> this file are subject to the license and copyright' >>>>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>>>> robots.txt line from 'https:www.duo.uio.no:443': ' detailed in the >>>>> LICENSE and NOTICE files at the root of the source' >>>>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>>>> robots.txt line from 'https:www.duo.uio.no:443': ' tree and >>>>> available >>>>> online at' >>>>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>>>> robots.txt line from 'https:www.duo.uio.no:443': ' >>>>> http://www.dspace.org/license/' >>>>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>>>> robots.txt line from 'https:www.duo.uio.no:443': '====' >>>>> >>>>> E >>>>> >>>>> >>>>> On 18.09.14 03:12, Karl Wright wrote: >>>>> >>>>> Please vote on whether to release Apache ManifoldCF 1.7.1, RC1. >>>>>> >>>>>> This release fixes a number of critical issues, as well as a number of >>>>>> user >>>>>> priorities, most notably: >>>>>> >>>>>> - A bad Zookeeper support issue, which made locking support fail when >>>>>> Zookeeper connections got lost and then restored; >>>>>> - The Alfresco connector, which was nonfunctional in both MCF 1.6 and >>>>>> 1.7; >>>>>> - Solr Cloud support, which had ceased working due to changes to >>>>>> SolrJ; >>>>>> - Non-null connector components caused failure; >>>>>> - PostgreSQL queries not performing well. >>>>>> >>>>>> The complete list of included fixes can be found at: >>>>>> >>>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1. >>>>>> 7.1-RC1/CHANGES.txt >>>>>> >>>>>> >>>>>> The release candidate can be downloaded from: >>>>>> >>>>>> http://people.apache.org/~kwright/apache-manifoldcf-1.7.1 >>>>>> >>>>>> There is a tag at: >>>>>> >>>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.7.1-RC1 >>>>>> >>>>>> Thanks, >>>>>> Karl >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >