[
https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382440#comment-15382440
]
Steven W commented on NUTCH-2267:
---------------------------------
There's a mismatch between the HttpClient versions in Hadoop and SolrJ. Hadoop
uses 4.2.5 and SolrJ uses 4.3+. The easiest way I found to fix it is to switch
the SOLR indexer in Nutch to use SystemDefaultHttpClient, which is available in
both versions of the HttpClient. However Lewis noticed that this class has been
deprecated.
You can see my changes to the Nutch indexer here:
https://github.com/apache/nutch/pull/129/files, which should work with your
setup.
Also, the notes here might be useful: https://github.com/apache/nutch/pull/129
> Solr indexer fails at the end of the job with a java error message
> ------------------------------------------------------------------
>
> Key: NUTCH-2267
> URL: https://issues.apache.org/jira/browse/NUTCH-2267
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.12
> Environment: hadoop v2.7.2 solr6 in cloud configuration with
> zookeeper 3.4.6. I use the master branch from github currently on commit
> da252eb7b3d2d7b70 ( NUTCH - 2263 mingram and maxgram support for Unigram
> Cosine Similarity Model is provided. )
> Reporter: kaveh minooie
> Assignee: Lewis John McGibbney
> Fix For: 1.13
>
>
> this is was what I was getting first:
> 16/05/23 13:52:27 INFO mapreduce.Job: map 100% reduce 100%
> 16/05/23 13:52:27 INFO mapreduce.Job: Task Id :
> attempt_1462499602101_0119_r_000000_0, Status : FAILED
> Error: Bad return type
> Exception Details:
> Location:
> org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient;
> @58: areturn
> Reason:
> Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame,
> stack[0]) is not assignable to
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
> Current Frame:
> bci: @58
> flags: { }
> locals: { 'org/apache/solr/common/params/SolrParams',
> 'org/apache/http/conn/ClientConnectionManager',
> 'org/apache/solr/common/params/ModifiableSolrParams',
> 'org/apache/http/impl/client/DefaultHttpClient' }
> stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
> Bytecode:
> 0x0000000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
> 0x0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
> 0x0000020: b600 0a2c b600 0bb6 000c b900 0d02 002b
> 0x0000030: b800 104e 2d2c b800 0f2d b0
> Stackmap Table:
> append_frame(@47,Object[#143])
> 16/05/23 13:52:28 INFO mapreduce.Job: map 100% reduce 0%
> as you can see the failed reducer gets re-spawned. then I found this issue:
> https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop
> config file. after that, the indexer seems to be able to finish ( I got the
> document in the solr, it seems ) but I still get the error message at the end
> of the job:
> 16/05/23 16:39:26 INFO mapreduce.Job: map 100% reduce 99%
> 16/05/23 16:39:44 INFO mapreduce.Job: map 100% reduce 100%
> 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed
> successfully
> 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53
> File System Counters
> FILE: Number of bytes read=42700154855
> FILE: Number of bytes written=70210771807
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=8699202825
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=537
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
> Job Counters
> Launched map tasks=134
> Launched reduce tasks=1
> Data-local map tasks=107
> Rack-local map tasks=27
> Total time spent by all maps in occupied slots (ms)=49377664
> Total time spent by all reduces in occupied slots (ms)=32765064
> Total time spent by all map tasks (ms)=3086104
> Total time spent by all reduce tasks (ms)=1365211
> Total vcore-milliseconds taken by all map tasks=3086104
> Total vcore-milliseconds taken by all reduce tasks=1365211
> Total megabyte-milliseconds taken by all map tasks=12640681984
> Total megabyte-milliseconds taken by all reduce tasks=8387856384
> Map-Reduce Framework
> Map input records=25305474
> Map output records=25305474
> Map output bytes=27422869763
> Map output materialized bytes=27489888004
> Input split bytes=15225
> Combine input records=0
> Combine output records=0
> Reduce input groups=16061459
> Reduce shuffle bytes=27489888004
> Reduce input records=25305474
> Reduce output records=230
> Spilled Records=54688613
> Shuffled Maps =134
> Failed Shuffles=0
> Merged Map outputs=134
> GC time elapsed (ms)=88103
> CPU time spent (ms)=3361270
> Physical memory (bytes) snapshot=144395186176
> Virtual memory (bytes) snapshot=751590166528
> Total committed heap usage (bytes)=156232056832
> IndexerStatus
> indexed (add/update)=230
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> SkippingTaskCounters
> MapProcessedRecords=25305474
> ReduceProcessedGroups=16061459
> File Input Format Counters
> Bytes Read=8699187600
> File Output Format Counters
> Bytes Written=0
> Exception in thread "main" java.lang.VerifyError: Bad return type
> Exception Details:
> Location:
>
> org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;)Lorg/apache/http/impl/client/CloseableHttpClient;
> @57: areturn
> Reason:
> Type 'org/apache/http/impl/client/SystemDefaultHttpClient' (current
> frame, stack[0]) is not assignable to
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
> Current Frame:
> bci: @57
> flags: { }
> locals: { 'org/apache/solr/common/params/SolrParams',
> 'org/apache/solr/common/params/ModifiableSolrParams',
> 'org/apache/http/impl/client/SystemDefaultHttpClient' }
> stack: { 'org/apache/http/impl/client/SystemDefaultHttpClient' }
> Bytecode:
> 0x0000000: bb00 0359 2ab7 0004 4cb2 0005 b900 0601
> 0x0000010: 0099 001e b200 05bb 0007 59b7 0008 1209
> 0x0000020: b600 0a2b b600 0bb6 000c b900 0d02 00b8
> 0x0000030: 000e 4d2c 2bb8 000f 2cb0
> Stackmap Table:
> append_frame(@47,Object[#143])
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:189)
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:162)
> at
> org.apache.nutch.indexwriter.solr.SolrUtils.getSolrClients(SolrUtils.java:54)
> at
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.open(SolrIndexWriter.java:78)
> at org.apache.nutch.indexer.IndexWriters.open(IndexWriters.java:75)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:148)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)