[
https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496621#comment-14496621
]
ASF GitHub Bot commented on NUTCH-1987:
---------------------------------------
GitHub user MJJoyce opened a pull request:
https://github.com/apache/nutch/pull/18
NUTCH-1987 - Make bin/crawl indexer agnostic
- Add solr.server.url property to nutch-default and set to value
consistent with URL used in the Nutch Tutorial.
- Change SOLRURL references to INDEXFLAG for consistency.
- Update all occurrences of crawl "usage" strings to no longer reference
solrURL and instead mention an optional string "run_indexer".
- Update indexer section to no longer set Solr URL property and remove
Solr references from prints.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MJJoyce/nutch NUTCH-1987
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nutch/pull/18.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18
----
commit a39de23453a6f8ea2a9ab2a94872af3305f16021
Author: Michael Joyce <[email protected]>
Date: 2015-04-15T17:41:36Z
NUTCH-1987 - Make bin/crawl indexer agnostic
- Add solr.server.url property to nutch-default and set to value
consistent with URL used in the Nutch Tutorial.
- Change SOLRURL references to INDEXFLAG for consistency.
- Update all occurrences of crawl "usage" strings to no longer reference
solrURL and instead mention an optional string "run_indexer".
- Update indexer section to no longer set Solr URL property and remove
Solr references from prints.
----
> Make bin/crawl indexer agnostic
> -------------------------------
>
> Key: NUTCH-1987
> URL: https://issues.apache.org/jira/browse/NUTCH-1987
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.9
> Reporter: Michael Joyce
> Fix For: 1.10
>
>
> The crawl script makes it a bit challenging to use an indexer that isn't
> Solr. For instance, when I want to use the indexer-elastic plugin I still
> need to call the crawler script with a fake Solr URL otherwise it will skip
> the indexing step all together.
> {code}
> bin/crawl urls/ crawl/ "http://fakeurl.com:9200" 1
> {code}
> It would be nice to keep configuration for the Solr indexer in the conf files
> (to mirror the elastic search indexer conf and others) and to make the
> indexing parameter simply toggle whether indexing does or doesn't occur
> instead of also trying to configure the indexer at the same time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)