Thanks for the update! Karl
On Mon, May 7, 2012 at 7:15 AM, Erlend Garåsen <e.f.gara...@usit.uio.no> wrote: > > Document deletion works perfectly after I reinstalled the SSL certificate > and reentered the username and password to our Solr server. So I think this > issue has been solved. > > Erlend > > On 27.04.12 12.11, Erlend Garåsen wrote: >> >> >> Many thanks for your suggestions and help, Karl. Using a filesystem >> crawl was actually a good idea for debugging/testing. To install a new >> version of Solr is not that easy on our test server for many reasons, >> generally because it is under control of another division dealing with >> servers at the uni, even though I can get root access. Anyway, according >> to the logs on our Solr 3.2 server, it seems that MCF successfully >> managed to delete one test document I removed: >> [2012-04-27 11:18:33.092] {delete=[file:/tmp/mcf/docs/app_lasso.pdf]} 0 7 >> [2012-04-27 11:18:33.092] [] webapp=/solr path=/update params={} >> status=0 QTime=7 >> >> The result code is 200 according to Simple History in MCF. >> >> I entered the passwords once again for the Solr servers into the Solr >> output configuration, deleted and uploaded our SSL certificate once >> again before I did the filesystem test. I should have performed the >> tests prior to the password updates. >> >> The crawl will start again later today at 6 pm on our production server, >> so I will try to figure out whether we still have problems later. I'm >> going to Scotland later this evening for some days without my laptop, so >> I cannot check the status of my crawl before I'm back, but I'll let my >> colleague watch the logs. >> >> Erlend >> >> On 26.04.12 21.14, Karl Wright wrote: >>> >>> Hi Erlend, >>> >>> I had some time today and was able to verify that everything worked >>> fine against what I have currently on my laptop, which is Solr 3.2. >>> The second job run looks like this: >>> >>> 04-26-2012 15:11:44.154 job end 1335467343879(test) 0 1 >>> 04-26-2012 15:11:34.159 document deletion (solr) >>> file:/C:/testcrawl/there.txt 200 0 117 >>> 04-26-2012 15:11:24.690 read document C:\testcrawl OK 0 1 >>> 04-26-2012 15:11:24.494 job start 1335467343879(test) 0 1 >>> >>> So it appears that either something changed in Solr, or SSL support is >>> broken, or your network is not permitting a valid HTTP response for >>> some reason. >>> >>> Karl >>> >>> >>> On Thu, Apr 26, 2012 at 11:10 AM, Karl Wright<daddy...@gmail.com> wrote: >>>> >>>> Hi Erlend, >>>> >>>> Can you try the following: >>>> >>>> (1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are >>>> using, and build it >>>> (2) Start it >>>> (3) Run a simple filesystem crawl using a Solr connection that is >>>> created with the default values >>>> (4) Delete a file in your filesystem that was crawled >>>> (5) Crawl again >>>> >>>> Does the deletion happen OK? >>>> >>>> AFAIK, nothing has changed in the Solr connector that should affect >>>> the ability to delete. This test will confirm that it is still >>>> working. >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>>> On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen >>>> <e.f.gara...@usit.uio.no> wrote: >>>>> >>>>> It seems that MCF cannot delete documents from Solr. A timeout >>>>> occurs, and >>>>> the job stops after a while. >>>>> >>>>> This is what I can see from the log: >>>>> WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service >>>>> interruption >>>>> reported for job 1327930125433 connection 'Web crawler': Ingestion API >>>>> socket timeout exception waiting for response code: Read timed out; >>>>> ingestion will be retried again later >>>>> >>>>> If I take a further look in Simple History, it seems that this error is >>>>> related to document deletion. >>>>> >>>>> I have tried to delete the document manually by using curl from the >>>>> same >>>>> server MCF is installed on in case we have some access restrictions, >>>>> but >>>>> Curr succeeded. >>>>> >>>>> We do not have any problems with adding, the timeout only occurs while >>>>> deleting documents. >>>>> >>>>> I have checked our Solr configuration. MCF does use the correct path >>>>> for >>>>> document deletion, i.e. /update. >>>>> >>>>> The correct realm, username and password for our Solr server are >>>>> entered >>>>> correctly and the SSL certificate is valid as well. >>>>> >>>>> Erlend >>>>> >>>>> -- >>>>> Erlend Garåsen >>>>> Center for Information Technology Services >>>>> University of Oslo >>>>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway >>>>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, >>>>> VIP: 31050 >> >> >> > > > -- > Erlend Garåsen > Center for Information Technology Services > University of Oslo > P.O. Box 1086 Blindern, N-0317 OSLO, Norway > Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050