[jira] [Commented] (NUTCH-1348) Solrindexer fails with a java.io.IOException error.

Christian Johnsson (JIRA) Wed, 02 May 2012 17:50:11 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267066#comment-13267066
 ]


Christian Johnsson commented on NUTCH-1348:
-------------------------------------------

Spent the entire day trying to reproduce the error but without "luck"
Now it seems to work properly. Quite weird, haven't rebooted or changed 
everything. Just trying on different segments from 1000 documents up to 100 000 
documents the whole day without any error what so ever.

I will try to re-index all segments during the night and if that works to 
there's nothing to fuss about. Probably just a bad weekend haha.

/// Here are some snippets from the weekend //

-------------

Generator: starting at 2012-05-01 03:48:41
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 100000
Generator: jobtracker is 'local', generating exactly one partition.
Generator: java.io.IOException: Job failed!

-------------

CrawlDb update: starting at 2012-05-01 02:43:23
CrawlDb update: db: crawl/crawldb
CrawlDb update: segments: [crawl/segments/20120501013724]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: 404 purging: true
CrawlDb update: Merging segment data into db.
CrawlDb update: java.io.IOException: Job failed!

-------------

bin/nutch solrindex http://127.0.0.1:8180/solr/webindex crawl/crawldb -linkdb 
crawl/linkdb -dir crawl/segments
SolrIndexer: starting at 2012-04-30 21:39:53
java.io.IOException: Job failed!

-------------

LinkDb: adding segment: crawl/segments/20120430200224
LinkDb: merging with existing linkdb: crawl/linkdb
LinkDb: finished at 2012-04-30 21:26:19, elapsed: 00:01:45
SolrIndexer: starting at 2012-04-30 21:26:20
java.io.IOException: Job failed!

-------------

Generator: starting at 2012-04-30 19:06:32
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 100000
Generator: jobtracker is 'local', generating exactly one partition.
Generator: java.io.IOException: Job failed!

-------------

And as i said... i haven't been able to reproduce any of them and it seems to 
work like a charm now.

                
> Solrindexer fails with a java.io.IOException error.
> ---------------------------------------------------
>
>                 Key: NUTCH-1348
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1348
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.5
>         Environment: Debian Stable AMD64
>            Reporter: Christian Johnsson
>
> I'm unable to reproduce this error but it happens from time to time when i 
> run solrindexer.
> I use the same commands as i did with 1.4 and about the same configuration 
> and i haven't changed any solr settings. 
> Have the same plugins active just to be able to compare.
> From time to time the solrindexer throws an error. It happends like 1-2 times 
> out of 5 and there is no information in the solr log about it.
> Not sure if it's a bug but i though i might as well report it since i've been 
> running 1.4 since it was released and never came across this error in that 
> version.
> 2012-05-01 20:44:14,861 INFO  httpclient.HttpMethodDirector - I/O exception 
> (java.net.SocketException) caught when processing request: Connection reset
> 2012-05-01 20:44:14,861 INFO  httpclient.HttpMethodDirector - Retrying request
> 2012-05-01 20:44:15,808 INFO  solr.SolrWriter - Indexing 250 documents
> 2012-05-01 20:44:36,153 WARN  mapred.LocalJobRunner - job_local_0001
> java.io.IOException
>       at 
> org.apache.nutch.indexer.solr.SolrWriter.makeIOException(SolrWriter.java:152)
>       at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:126)
>       at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:55)
>       at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:44)
>       at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:440)
>       at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:195)
>       at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:51)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> Caused by: org.apache.solr.client.solrj.SolrServerException: 
> org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing 
> request can not be repeated.
>       at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:475)
>       at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>       at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>       at org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:124)
>       ... 8 more
> Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered entity 
> enclosing request can not be repeated.
>       at 
> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
>       at 
> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>       at 
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>       at 
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>       at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>       at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>       at 
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>       at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:422)
>       ... 11 more
> 2012-05-01 20:44:37,074 ERROR solr.SolrIndexer - java.io.IOException: Job 
> failed!
> It's running on a single machine and no hadoop.
> It's indexing around 50-80 000 smaller documents. Worked flawless in 1.4
> Thats about it :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1348) Solrindexer fails with a java.io.IOException error.

Reply via email to