[ 
https://issues.apache.org/jira/browse/NUTCH-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1076:
----------------------------------------

    Fix Version/s: 1.7
    
> Solrindex has no documents following bin/nutch solrindex when using 
> protocol-file
> ---------------------------------------------------------------------------------
>
>                 Key: NUTCH-1076
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1076
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.3
>         Environment: Ubuntu Linux 10.04 server
> JDK 1.6
> Nutch 1.3
> Solr 3.1.0
>            Reporter: Seth Griffin
>            Assignee: Markus Jelsma
>              Labels: nutch, protocol-file, solrindex
>             Fix For: 1.7
>
>
> Note: When using protocol-http I am able to update solr effortlessly.
> To test this I have a single pdf file that I am trying to index in my urls 
> directory.
> I execute:
> bin/nutch crawl urls
> Output:
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl-20110805151045
> rootUrlDir = urls
> threads = 10
> depth = 5
> solrUrl=null
> Injector: starting at 2011-08-05 15:10:45
> Injector: crawlDb: crawl-20110805151045/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-08-05 15:10:48, elapsed: 00:00:02
> Generator: starting at 2011-08-05 15:10:48
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls for politeness.
> Generator: segment: crawl-20110805151045/segments/20110805151050
> Generator: finished at 2011-08-05 15:10:51, elapsed: 00:00:03
> Fetcher: Your 'http.agent.name' value should be listed first in 
> 'http.robots.agents' property.
> Fetcher: starting at 2011-08-05 15:10:51
> Fetcher: segment: crawl-20110805151045/segments/20110805151050
> Fetcher: threads: 10
> QueueFeeder finished: total 1 records + hit by time limit :0
> fetching file:///home/nutch/nutch-1.3/runtime/local/indexdir/Altec.pdf
> -finishing thread FetcherThread, activeThreads=9
> -finishing thread FetcherThread, activeThreads=8
> -finishing thread FetcherThread, activeThreads=7
> -finishing thread FetcherThread, activeThreads=6
> -finishing thread FetcherThread, activeThreads=5
> -finishing thread FetcherThread, activeThreads=4
> -finishing thread FetcherThread, activeThreads=3
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=0
> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> -activeThreads=0
> Fetcher: finished at 2011-08-05 15:10:53, elapsed: 00:00:02
> ParseSegment: starting at 2011-08-05 15:10:53
> ParseSegment: segment: crawl-20110805151045/segments/20110805151050
> ParseSegment: finished at 2011-08-05 15:10:56, elapsed: 00:00:03
> CrawlDb update: starting at 2011-08-05 15:10:56
> CrawlDb update: db: crawl-20110805151045/crawldb
> CrawlDb update: segments: [crawl-20110805151045/segments/20110805151050]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: true
> CrawlDb update: URL filtering: true
> CrawlDb update: Merging segment data into db.
> CrawlDb update: finished at 2011-08-05 15:10:57, elapsed: 00:00:01
> Generator: starting at 2011-08-05 15:10:57
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=1 - no more URLs to fetch.
> LinkDb: starting at 2011-08-05 15:10:58
> LinkDb: linkdb: crawl-20110805151045/linkdb
> LinkDb: URL normalize: true
> LinkDb: URL filter: true
> LinkDb: adding segment: 
> file:/home/nutch/nutch-1.3/runtime/local/crawl-20110805151045/segments/20110805151050
> LinkDb: finished at 2011-08-05 15:10:59, elapsed: 00:00:01
> crawl finished: crawl-20110805151045
> Then with a clean solr index (stats output from stats.jsp below):
> searcherName : Searcher@14dd758 main
> caching : true
> numDocs : 0
> maxDoc : 0
> reader : 
> SolrIndexReader{this=1ee148b,r=ReadOnlyDirectoryReader@1ee148b,refCnt=1,segments=0}
> readerDir : 
> org.apache.lucene.store.NIOFSDirectory@/home/solr/apache-solr-3.1.0/example/solr/data/index
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@987197
> indexVersion : 1312575204101
> openedAt : Fri Aug 05 15:13:24 CDT 2011
> registeredAt : Fri Aug 05 15:13:24 CDT 2011
> warmupTime : 0 
> I then execute:
> bin/nutch solrindex http://localhost:8983/solr/ crawl-20110805151045/crawldb/ 
> crawl-20110805151045/linkdb/ crawl-20110805151045/segments/*
> bin/nutch output:
> SolrIndexer: starting at 2011-08-05 15:15:48
> SolrIndexer: finished at 2011-08-05 15:15:50, elapsed: 00:00:01
> solr output:
> Aug 5, 2011 3:15:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start 
> commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher <init>
> INFO: Opening Searcher@15f1f9c main
> Aug 5, 2011 3:15:50 PM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
>       
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@15f1f9c main
>       
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
>       
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@15f1f9c main
>       
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
>       
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=1,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@15f1f9c main
>       
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming Searcher@15f1f9c main from Searcher@14dd758 main
>       
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for Searcher@15f1f9c main
>       
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener sending requests to Searcher@15f1f9c main
> Aug 5, 2011 3:15:50 PM org.apache.solr.core.QuerySenderListener newSearcher
> INFO: QuerySenderListener done.
> Aug 5, 2011 3:15:50 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: [] Registered new searcher Searcher@15f1f9c main
> Aug 5, 2011 3:15:50 PM org.apache.solr.search.SolrIndexSearcher close
> INFO: Closing Searcher@14dd758 main
>       
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>       
> filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>       
> queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=1,evictions=0,size=1,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>       
> documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Aug 5, 2011 3:15:50 PM org.apache.solr.update.processor.LogUpdateProcessor 
> finish
> INFO: {commit=} 0 8
> Aug 5, 2011 3:15:50 PM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update 
> params={waitSearcher=true&waitFlush=true&wt=javabin&commit=true&version=2} 
> status=0 QTime=8
> output from stats.jsp:
> stats:        
> searcherName : Searcher@15f1f9c main
> caching : true
> numDocs : 0
> maxDoc : 0
> reader : 
> SolrIndexReader{this=1ee148b,r=ReadOnlyDirectoryReader@1ee148b,refCnt=1,segments=0}
> readerDir : 
> org.apache.lucene.store.NIOFSDirectory@/home/solr/apache-solr-3.1.0/example/solr/data/index
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@987197
> indexVersion : 1312575204101
> openedAt : Fri Aug 05 15:15:50 CDT 2011
> registeredAt : Fri Aug 05 15:15:50 CDT 2011
> warmupTime : 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to