[
https://issues.apache.org/jira/browse/NUTCH-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2507.
------------------------------------
Assignee: Sebastian Nagel
Resolution: Fixed
Thanks, [~artodeto]! The section in
[https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial] related to
Solr indexing have been updated.
> NutchTutorial wiki pages as a lot of outdated command line calls when it
> starts with the solr interaction
> ---------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-2507
> URL: https://issues.apache.org/jira/browse/NUTCH-2507
> Project: Nutch
> Issue Type: Bug
> Components: documentation
> Affects Versions: 1.14
> Reporter: artodeto
> Assignee: Sebastian Nagel
> Priority: Major
> Labels: documentation, easyfix
> Fix For: 1.17
>
>
> h2. h2. Section "Step-by-Step: Indexing into Apache Solr"
> replace:
> {code:java}
> Example: bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb
> crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize
> -deleteGone{code}
> with:
> {code:java}
> Example: bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch
> ${NUTCH_RUNTIME_HOME}/crawl
> /crawldb/ -linkdb ${NUTCH_RUNTIME_HOME}/crawl
> /linkdb/ ${NUTCH_RUNTIME_HOME}/crawl
> /segments/20131108063838
> / -filter -normalize -deleteGo{code}
>
> h2. Section "Step-by-Step: Deleting Duplicates"
> replace:
> {code:java}
> Usage: bin/nutch dedup <solr url>
> Example: /bin/nutch dedup http://localhost:8983/solr
> {code}
> with:
> {code:java}
> Usage: bin/nutch dedup <path to the crawldb> <solr url>
> Example: /bin/nutch dedup ${NUTCH_RUNTIME_HOME}/crawl/crawldb/
> http://localhost:8983/sol
> {code}
> h2. Section "Step-by-Step: Cleaning Solr"
> replace:
> {code:java}
> Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
> Example: /bin/nutch clean
> -Dsolr.server.url=http://localhost:8983/solr/nutch crawl/crawldb/
> {code}
> with:
> {code}
> Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
> Example: /bin/nutch clean
> -Dsolr.server.url=http://localhost:8983/solr/nutch
> ${NUTCH_RUNTIME_HOME}/crawl/crawldb/
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)