Ralf created NUTCH-1773:
---------------------------

             Summary: Solr Indexer fails
                 Key: NUTCH-1773
                 URL: https://issues.apache.org/jira/browse/NUTCH-1773
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 2.3
         Environment: Ubuntu 12.04 LTS, java version "1.7.0_55" - Hbase-0.90.6 
(pseudo dist), Hadoop 1.2.1, Solr 4.6
            Reporter: Ralf
            Priority: Critical
             Fix For: 2.3


When using crawl script or solrindexer by itself (/bin/nutch solrindex) in 
localmode it fails with:

hduser@bl4ck1c3:~/nutch-2.3/runtime/local$ bin/nutch solrindex TestCrawl18 
-reindex
IndexingJob: starting
Active IndexWriters :
SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default 
solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : use authentication (default false)
        solr.auth : username for authentication
        solr.auth.password : password for authentication


SolrIndexerJob: java.lang.IllegalStateException: Target host must not be null, 
or set in parameters.
        at 
org.apache.http.impl.client.DefaultRequestDirector.determineRoute(DefaultRequestDirector.java:787)
        at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:414)
        at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
        at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
        at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:393)
        at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
        at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
        at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
        at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:146)
        at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:127)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:171)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:187)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:196)

when using the new INDEX command it finishes, but nothing is added to Solr:

hduser@bl4ck1c3:~/nutch-2.3/runtime/local$ bin/nutch index TestCrawl18 -reindex
IndexingJob: starting
Active IndexWriters :
SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default 
solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : use authentication (default false)
        solr.auth : username for authentication
        solr.auth.password : password for authentication
 
Log shows:

2014-05-13 03:01:13,781 INFO  indexer.IndexingJob - IndexingJob: starting
2014-05-13 03:01:14,108 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.analysis.lang.LanguageIndexingFilter
2014-05-13 03:01:14,109 INFO  basic.BasicIndexingFilter - Maximum title length 
for indexing set to: 100
2014-05-13 03:01:14,109 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2014-05-13 03:01:14,335 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.more.MoreIndexingFilter
2014-05-13 03:01:14,336 INFO  anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2014-05-13 03:01:14,336 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2014-05-13 03:01:14,620 WARN  zookeeper.ClientCnxnSocket - Connected to an old 
server; r-o mode will be unavailable
2014-05-13 03:01:14,768 WARN  zookeeper.ClientCnxnSocket - Connected to an old 
server; r-o mode will be unavailable
2014-05-13 03:01:14,968 WARN  zookeeper.ClientCnxnSocket - Connected to an old 
server; r-o mode will be unavailable
2014-05-13 03:01:15,243 WARN  zookeeper.ClientCnxnSocket - Connected to an old 
server; r-o mode will be unavailable
2014-05-13 03:01:15,276 WARN  zookeeper.ClientCnxnSocket - Connected to an old 
server; r-o mode will be unavailable
2014-05-13 03:01:15,326 WARN  zookeeper.ClientCnxnSocket - Connected to an old 
server; r-o mode will be unavailable
2014-05-13 03:01:15,386 INFO  indexer.IndexWriters - Adding 
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2014-05-13 03:01:15,403 INFO  solr.SolrMappingReader - source: content dest: 
content
2014-05-13 03:01:15,403 INFO  solr.SolrMappingReader - source: title dest: title
2014-05-13 03:01:15,403 INFO  solr.SolrMappingReader - source: host dest: host
2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: batchId dest: 
batchId
2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: boost dest: boost
2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: digest dest: 
digest
2014-05-13 03:01:15,404 INFO  solr.SolrMappingReader - source: tstamp dest: 
tstamp
2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.analysis.lang.LanguageIndexingFilter
2014-05-13 03:01:15,405 INFO  basic.BasicIndexingFilter - Maximum title length 
for indexing set to: 100
2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter
2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.more.MoreIndexingFilter
2014-05-13 03:01:15,405 INFO  anchor.AnchorIndexingFilter - Anchor 
deduplication is: off
2014-05-13 03:01:15,405 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2014-05-13 03:01:15,426 WARN  zookeeper.ClientCnxnSocket - Connected to an old 
server; r-o mode will be unavailable
2014-05-13 03:01:15,442 WARN  mapred.FileOutputCommitter - Output path is null 
in cleanup
2014-05-13 03:01:16,144 INFO  indexer.IndexWriters - Adding 
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2014-05-13 03:01:16,144 INFO  indexer.IndexingJob - Active IndexWriters :
SOLRIndexWriter
        solr.server.url : URL of the SOLR instance (mandatory)
        solr.commit.size : buffer size when sending to SOLR (default 1000)
        solr.mapping.file : name of the mapping file for fields (default 
solrindex-mapping.xml)
        solr.auth : use authentication (default false)
        solr.auth.username : use authentication (default false)
        solr.auth : username for authentication
        solr.auth.password : password for authentication


2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: content dest: 
content
2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: title dest: title
2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: host dest: host
2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: batchId dest: 
batchId
2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: boost dest: boost
2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: digest dest: 
digest
2014-05-13 03:01:16,145 INFO  solr.SolrMappingReader - source: tstamp dest: 
tstamp
2014-05-13 03:01:16,338 INFO  solr.SolrIndexWriter - Total 0 document is added.
2014-05-13 03:01:16,338 INFO  indexer.IndexingJob - IndexingJob: done.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to