RE: Nutch/Solr communication problem

Markus Jelsma Mon, 18 Jan 2016 07:07:15 -0800

Yes i have used it, i made the damn patch myself years ago, and i used the same 
configuration. Command line or config work the same.
Markus


-----Original message-----
From: Zara Parst<[email protected]>
Sent: Monday 18th January 2016 12:55
To: [email protected]
Subject: Re: Nutch/Solr communication problem

Dear Markus,

Are you just speaking blindly or what ?? My concern is did you ever try pushing 
index to solr which is password protected ? If yes can you just tell me what 
were the config you used , if you did that in config file then let me know or 
if you did through command then please let me know.

thanks

On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma <[email protected] 
<mailto:[email protected]>> wrote:
Hi - This doesnt look like a HTTP basic authentication problem. Are you running 
Solr 5.x?

Markus

-----Original message-----

From: Zara Parst<[email protected] <mailto:[email protected]>>

Sent: Monday 18th January 2016 11:55

To: [email protected] <mailto:[email protected]>

Subject: Re: Nutch/Solr communication problem

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http 
however options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value 
for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings 
to be used (madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default 
solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
crawldb: crawlDbyah/crawldb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
linkdb: crawlDbyah/linkdb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduces: 
adding segment: crawlDbyah/segments/20160117191906

2016-01-17 19:19:42,975 WARN  indexer.IndexerMapReduce - Ignoring linkDb for 
indexing, no linkDb found in path: crawlDbyah/linkdb

2016-01-17 19:19:43,807 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,809 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:43,963 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,980 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:44,260 INFO  anchor.AnchorIndexingFilter - Anchor 
deduplication is: off

2016-01-17 19:19:45,128 INFO  indexer.IndexWriters - Adding 
org.apache.nutch.indexwriter.solr.SolrIndexWriter

2016-01-17 19:19:45,148 INFO  solr.SolrUtils - Authenticating as: radmin

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: content dest: 
content

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: title dest: title

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: host dest: host

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: segment dest: 
segment

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: boost dest: boost

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: digest dest: 
digest

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: tstamp dest: 
tstamp

2016-01-17 19:19:45,360 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,507 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,526 WARN  mapred.LocalJobRunner - job_local2114349538_0001

java.lang.Exception: java.io.IOException

        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)

Caused by: java.io.IOException

        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171)

        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157)

        at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)

        at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)

        at 
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502)

        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)

        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
occured when talking to server at: http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah>>

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)

        at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)

        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)

        ... 11 more

Caused by: org.apache.http.client.ClientProtocolException

        at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)

        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)

        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)

        ... 15 more

Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry 
request with a non-repeatable request entity.

        at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208)

        at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)

        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)

        at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)

        at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)

        ... 19 more

2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer: 
java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>> wrote:

Hi - can you post the log output?

Markus

-----Original message-----

From: Zara Parst<[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>

Sent: Monday 18th January 2016 2:06

To: [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>

Subject: Nutch/Solr communication problem

Hi everyone,

I have situation here, I am using nutch 1.11 and solr 5.4

Solr is protected by user name and password  I am passing credential to solr 
using following command

bin/crawl -i -Dsolr.server.url=http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>>>  -D solr.auth=true  
-Dsolr.auth.username=xxxx  -Dsolr.auth.password=xxx  url crawlDbyah 1

and always same problem , please help me how to feed data to protected solr.

Below is error message.

Indexer: starting at 2016-01-17 19:01:12

Indexer: deleting gone documents: false

Indexer: URL filtering: false

Indexer: URL normalizing: false

Active IndexWriters :

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http 
however options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value 
for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings 
to be used (madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default 
solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

Indexing 2 documents

Indexing 2 documents

Indexer: java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

I also tried username and password in nutch-default.xml but again same error. 
Please help me out.

RE: Nutch/Solr communication problem

Reply via email to