Hi Erlend, Your robots file has this at the top:
==== The contents of this file are subject to the license and copyright detailed in the LICENSE and NOTICE files at the root of the source tree and available online at http://www.dspace.org/license/ ==== That's fine except to the best of my knowledge the robots spec does not allow for comments at all. If you have reason to believe that has changed, then please point me at a reference and I can change our robots parser. Thanks, Karl On Thu, Sep 18, 2014 at 6:02 AM, Karl Wright <daddy...@gmail.com> wrote: > Hi Erlend, > > MCF caches the robots.txt file in the database, which it considers valid > for 1 hour. > > I'll look at the logs and thread dump and let you know if this is a > locking issue or something else. Please stand by. > > Karl > > > On Thu, Sep 18, 2014 at 5:24 AM, Erlend Garåsen <e.f.gara...@usit.uio.no> > wrote: > >> >> I tried to restart the job dealing with www.duo.no on our test server, >> but it does not seem to touch the robots.txt file at all. That's the reason >> why it's able to continue. Both servers are set up to obey the rules of >> such files. >> >> Erlend >> >> >> On 18.09.14 11:12, Erlend Garåsen wrote: >> >>> >>> I'm facing the same problems with robot.txt files using RC1, so maybe >>> this is another issue we have to fix. Can you please try to fetch the >>> host below? For some odd reason, it seems that MCF on our test server >>> can handle it. >>> >>> This is exactly the same that happened when I started MCF (referring to >>> my previous post) after I had deployed RC1: >>> 09-18-2014 11:02:14.400 robots parse https:www.duo.uio.no:443 >>> ERRORS 0 3 Unknown robots.txt line: '====' >>> >>> No activity after this error. >>> >>> Here's the robots.txt file: >>> https://www.duo.uio.no/robots.txt >>> >>> This is the content of manifoldcf.log after the startup: >>> WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown >>> robots.txt line from 'https:www.duo.uio.no:443': '====' >>> WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown >>> robots.txt line from 'https:www.duo.uio.no:443': ' The contents of >>> this file are subject to the license and copyright' >>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>> robots.txt line from 'https:www.duo.uio.no:443': ' detailed in the >>> LICENSE and NOTICE files at the root of the source' >>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>> robots.txt line from 'https:www.duo.uio.no:443': ' tree and available >>> online at' >>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>> robots.txt line from 'https:www.duo.uio.no:443': ' >>> http://www.dspace.org/license/' >>> WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown >>> robots.txt line from 'https:www.duo.uio.no:443': '====' >>> >>> E >>> >>> >>> On 18.09.14 03:12, Karl Wright wrote: >>> >>>> Please vote on whether to release Apache ManifoldCF 1.7.1, RC1. >>>> >>>> This release fixes a number of critical issues, as well as a number of >>>> user >>>> priorities, most notably: >>>> >>>> - A bad Zookeeper support issue, which made locking support fail when >>>> Zookeeper connections got lost and then restored; >>>> - The Alfresco connector, which was nonfunctional in both MCF 1.6 and >>>> 1.7; >>>> - Solr Cloud support, which had ceased working due to changes to SolrJ; >>>> - Non-null connector components caused failure; >>>> - PostgreSQL queries not performing well. >>>> >>>> The complete list of included fixes can be found at: >>>> >>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1. >>>> 7.1-RC1/CHANGES.txt >>>> >>>> >>>> The release candidate can be downloaded from: >>>> >>>> http://people.apache.org/~kwright/apache-manifoldcf-1.7.1 >>>> >>>> There is a tag at: >>>> >>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.7.1-RC1 >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>> >> >