Hi - then that is the problem. If there was an authentication issue, a 401-like exception should be visible, not the stack trace you posted. Nutch does not yet support Solr 5.x but a colleague uploaded a patch recently for Nutch 1.11 with Solr 5.x support. We use it in production although without basic HTTP authentication, but it should work as it is based on the older indexer plugins.
See: https://issues.apache.org/jira/browse/NUTCH-2197 Markus -----Original message----- From: Zara Parst<[email protected]> Sent: Monday 18th January 2016 21:28 To: [email protected] Subject: Re: Nutch/Solr communication problem I am using solr 5.4 and nutch 1.11 On Tue, Jan 19, 2016 at 1:46 AM, Markus Jelsma <[email protected] <mailto:[email protected]>> wrote: Hi - it was an answer to your question whether i have ever used it. Yes, i patched and committed it. And therefore i asked if youre using Solr 5 or not. So again, are you using Solr 5? Markus -----Original message----- From: Zara Parst<[email protected] <mailto:[email protected]>> Sent: Monday 18th January 2016 16:16 To: [email protected] <mailto:[email protected]> Subject: Re: Nutch/Solr communication problem Mind to share that patch ? On Mon, Jan 18, 2016 at 8:28 PM, Markus Jelsma <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote: Yes i have used it, i made the damn patch myself years ago, and i used the same configuration. Command line or config work the same. Markus -----Original message----- From: Zara Parst<[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> Sent: Monday 18th January 2016 12:55 To: [email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> Subject: Re: Nutch/Solr communication problem Dear Markus, Are you just speaking blindly or what ?? My concern is did you ever try pushing index to solr which is password protected ? If yes can you just tell me what were the config you used , if you did that in config file then let me know or if you did through command then please let me know. thanks On Mon, Jan 18, 2016 at 4:50 PM, Markus Jelsma <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>> wrote: Hi - This doesnt look like a HTTP basic authentication problem. Are you running Solr 5.x? Markus -----Original message----- From: Zara Parst<[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>> Sent: Monday 18th January 2016 11:55 To: [email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> Subject: Re: Nutch/Solr communication problem SolrIndexWriter solr.server.type : Type of SolrServer to communicate with (default http however options include cloud, lb and concurrent) solr.server.url : URL of the Solr instance (mandatory) solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for solr.server.type) solr.loadbalance.urls : Comma-separated string of Solr server strings to be used (madatory if lb value for solr.server.type) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.commit.size : buffer size when sending to Solr (default 1000) solr.auth : use authentication (default false) solr.auth.username : username for authentication solr.auth.password : password for authentication 2016-01-17 19:19:42,973 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawlDbyah/crawldb 2016-01-17 19:19:42,973 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawlDbyah/linkdb 2016-01-17 19:19:42,973 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawlDbyah/segments/20160117191906 2016-01-17 19:19:42,975 WARN indexer.IndexerMapReduce - Ignoring linkDb for indexing, no linkDb found in path: crawlDbyah/linkdb 2016-01-17 19:19:43,807 WARN conf.Configuration - file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2016-01-17 19:19:43,809 WARN conf.Configuration - file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2016-01-17 19:19:43,963 WARN conf.Configuration - file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2016-01-17 19:19:43,980 WARN conf.Configuration - file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2016-01-17 19:19:44,260 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2016-01-17 19:19:45,128 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2016-01-17 19:19:45,148 INFO solr.SolrUtils - Authenticating as: radmin 2016-01-17 19:19:45,318 INFO solr.SolrMappingReader - source: content dest: content 2016-01-17 19:19:45,318 INFO solr.SolrMappingReader - source: title dest: title 2016-01-17 19:19:45,318 INFO solr.SolrMappingReader - source: host dest: host 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: segment dest: segment 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: boost dest: boost 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: digest dest: digest 2016-01-17 19:19:45,319 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2016-01-17 19:19:45,360 INFO solr.SolrIndexWriter - Indexing 2 documents 2016-01-17 19:19:45,507 INFO solr.SolrIndexWriter - Indexing 2 documents 2016-01-17 19:19:45,526 WARN mapred.LocalJobRunner - job_local2114349538_0001 java.lang.Exception: java.io.IOException at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.io.IOException at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157) at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>>> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah> <http://127.0.0.1:8983/solr/yah <http://127.0.0.1:8983/solr/yah>>>> at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153) ... 11 more Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448) ... 15 more Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ... 19 more 2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237) On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>>> wrote: Hi - can you post the log output? Markus -----Original message----- From: Zara Parst<[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>>> Sent: Monday 18th January 2016 2:06 To: [email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>> Subject: Nutch/Solr communication problem Hi everyone, I have situation here, I am using nutch 1.11 and solr 5.4 Solr is protected by user name and password I am passing credential to solr using following command bin/crawl -i -Dsolr.server.url=http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>> <http://localhost:8983/solr/abc <http://local host:8983/solr/abc> <http://localhost:8983/solr/abc <http://localhost:8983/solr/abc>>>>> -D solr.auth=true -Dsolr.auth.username=xxxx -Dsolr.auth.password=xxx url crawlDbyah 1 and always same problem , please help me how to feed data to protected solr. Below is error message. Indexer: starting at 2016-01-17 19:01:12 Indexer: deleting gone documents: false Indexer: URL filtering: false Indexer: URL normalizing: false Active IndexWriters : SolrIndexWriter solr.server.type : Type of SolrServer to communicate with (default http however options include cloud, lb and concurrent) solr.server.url : URL of the Solr instance (mandatory) solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud value for solr.server.type) solr.loadbalance.urls : Comma-separated string of Solr server strings to be used (madatory if lb value for solr.server.type) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.commit.size : buffer size when sending to Solr (default 1000) solr.auth : use authentication (default false) solr.auth.username : username for authentication solr.auth.password : password for authentication Indexing 2 documents Indexing 2 documents Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237) I also tried username and password in nutch-default.xml but again same error. Please help me out.

