Hi - then that is the problem. If there was an authentication issue, a 401-like 
exception should be visible, not the stack trace you posted. Nutch does not yet 
support Solr 5.x but a colleague uploaded a patch recently for Nutch 1.11 with 
Solr 5.x support. We use it in production although without basic HTTP 
authentication, but it should work as it is based on the older indexer plugins.

See: https://issues.apache.org/jira/browse/NUTCH-2197

Markus

-----Original message-----
From: Zara Parst<[email protected]>
Sent: Monday 18th January 2016 21:28
To: [email protected]
Subject: Re: Nutch/Solr communication problem

I am using solr 5.4 and nutch 1.11

On Tue, Jan 19, 2016 at 1:46 AM, Markus Jelsma <[email protected] 
<mailto:[email protected]>> wrote:
Hi - it was an answer to your question whether i have ever used it. Yes, i 
patched and committed it. And therefore i asked if youre using Solr 5 or not. 
So again, are you using Solr 5?

Markus

-----Original message-----

From: Zara Parst<[email protected] <mailto:[email protected]>>

Sent: Monday 18th January 2016 16:16

To: [email protected] <mailto:[email protected]>

Subject: Re: Nutch/Solr communication problem

Mind to share that patch ?

On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma <[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>> wrote:

Yes i have used it, i made the damn patch myself years ago, and i used the same 
configuration. Command line or config work the same.

Markus

-----Original message-----

From: Zara Parst<[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>

Sent: Monday 18th January 2016 12:55

To: [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>

Subject: Re: Nutch/Solr communication problem

Dear Markus,

Are you just speaking blindly or what ?? My concern is did you ever try pushing 
index to solr which is password protected ? If yes can you just tell me what 
were the config you used , if you did that in config file then let me know or 
if you did through command then please let me know.

thanks

On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma <[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>> <mailto:[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>>> wrote:

Hi - This doesnt look like a HTTP basic authentication problem. Are you running 
Solr 5.x?

Markus

-----Original message-----

From: Zara Parst<[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>>

Sent: Monday 18th January 2016 11:55

To: [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>

Subject: Re: Nutch/Solr communication problem

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http 
however options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value 
for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings 
to be used (madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default 
solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
crawldb: crawlDbyah/crawldb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce: 
linkdb: crawlDbyah/linkdb

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduces: 
adding segment: crawlDbyah/segments/20160117191906

2016-01-17 19:19:42,975 WARN  indexer.IndexerMapReduce - Ignoring linkDb for 
indexing, no linkDb found in path: crawlDbyah/linkdb

2016-01-17 19:19:43,807 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,809 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:43,963 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.

2016-01-17 19:19:43,980 WARN  conf.Configuration - 
file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.

2016-01-17 19:19:44,260 INFO  anchor.AnchorIndexingFilter - Anchor 
deduplication is: off

2016-01-17 19:19:45,128 INFO  indexer.IndexWriters - Adding 
org.apache.nutch.indexwriter.solr.SolrIndexWriter

2016-01-17 19:19:45,148 INFO  solr.SolrUtils - Authenticating as: radmin

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: content dest: 
content

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: title dest: title

2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: host dest: host

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: segment dest: 
segment

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: boost dest: boost

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: digest dest: 
digest

2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: tstamp dest: 
tstamp

2016-01-17 19:19:45,360 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,507 INFO  solr.SolrIndexWriter - Indexing 2 documents

2016-01-17 19:19:45,526 WARN  mapred.LocalJobRunner - job_local2114349538_0001

java.lang.Exception: java.io.IOException

        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)

Caused by: java.io.IOException

        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171)

        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157)

        at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)

        at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)

        at 
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502)

        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)

        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.solr.client.solrj.SolrServerException: IOException 
occured when talking to server at: http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah>> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah>>> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah>> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah 
<http://127.0.0.1:8983/solr/yah>>>>

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)

        at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)

        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)

        ... 11 more

Caused by: org.apache.http.client.ClientProtocolException

        at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)

        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)

        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)

        at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)

        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)

        ... 15 more

Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry 
request with a non-repeatable request entity.

        at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208)

        at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)

        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)

        at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)

        at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)

        ... 19 more

2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer: 
java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>> <mailto:[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>> <mailto:[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>> <mailto:[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>>>> wrote:

Hi - can you post the log output?

Markus

-----Original message-----

From: Zara Parst<[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>>>

Sent: Monday 18th January 2016 2:06

To: [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>> 
<mailto:[email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>>>

Subject: Nutch/Solr communication problem

Hi everyone,

I have situation here, I am using nutch 1.11 and solr 5.4

Solr is protected by user name and password  I am passing credential to solr 
using following command

bin/crawl -i -Dsolr.server.url=http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>>> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>>>> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>>> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <http://local
 host:8983/solr/abc> <http://localhost:8983/solr/abc 
<http://localhost:8983/solr/abc>>>>>  -D solr.auth=true  
-Dsolr.auth.username=xxxx  -Dsolr.auth.password=xxx  url crawlDbyah 1

and always same problem , please help me how to feed data to protected solr.

Below is error message.

Indexer: starting at 2016-01-17 19:01:12

Indexer: deleting gone documents: false

Indexer: URL filtering: false

Indexer: URL normalizing: false

Active IndexWriters :

SolrIndexWriter

        solr.server.type : Type of SolrServer to communicate with (default http 
however options include cloud, lb and concurrent)

        solr.server.url : URL of the Solr instance (mandatory)

        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value 
for solr.server.type)

        solr.loadbalance.urls : Comma-separated string of Solr server strings 
to be used (madatory if lb value for solr.server.type)

        solr.mapping.file : name of the mapping file for fields (default 
solrindex-mapping.xml)

        solr.commit.size : buffer size when sending to Solr (default 1000)

        solr.auth : use authentication (default false)

        solr.auth.username : username for authentication

        solr.auth.password : password for authentication

Indexing 2 documents

Indexing 2 documents

Indexer: java.io.IOException: Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)

        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)

        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

I also tried username and password in nutch-default.xml but again same error. 
Please help me out.


Reply via email to