[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

ASF GitHub Bot (JIRA) Wed, 15 Apr 2015 11:15:21 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496621#comment-14496621
 ]


ASF GitHub Bot commented on NUTCH-1987:
---------------------------------------

GitHub user MJJoyce opened a pull request:

    https://github.com/apache/nutch/pull/18

    NUTCH-1987 - Make bin/crawl indexer agnostic

    - Add solr.server.url property to nutch-default and set to value
      consistent with URL used in the Nutch Tutorial.
    - Change SOLRURL references to INDEXFLAG for consistency.
    - Update all occurrences of crawl "usage" strings to no longer reference
      solrURL and instead mention an optional string "run_indexer".
    - Update indexer section to no longer set Solr URL property and remove
      Solr references from prints.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MJJoyce/nutch NUTCH-1987

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18
    
----
commit a39de23453a6f8ea2a9ab2a94872af3305f16021
Author: Michael Joyce <[email protected]>
Date:   2015-04-15T17:41:36Z

    NUTCH-1987 - Make bin/crawl indexer agnostic
    
    - Add solr.server.url property to nutch-default and set to value
      consistent with URL used in the Nutch Tutorial.
    - Change SOLRURL references to INDEXFLAG for consistency.
    - Update all occurrences of crawl "usage" strings to no longer reference
      solrURL and instead mention an optional string "run_indexer".
    - Update indexer section to no longer set Solr URL property and remove
      Solr references from prints.

----


> Make bin/crawl indexer agnostic
> -------------------------------
>
>                 Key: NUTCH-1987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1987
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.9
>            Reporter: Michael Joyce
>             Fix For: 1.10
>
>
> The crawl script makes it a bit challenging to use an indexer that isn't 
> Solr. For instance, when I want to use the indexer-elastic plugin I still 
> need to call the crawler script with a fake Solr URL otherwise it will skip 
> the indexing step all together.
> {code}
> bin/crawl urls/ crawl/ "http://fakeurl.com:9200"; 1
> {code}
> It would be nice to keep configuration for the Solr indexer in the conf files 
> (to mirror the elastic search indexer conf and others) and to make the 
> indexing parameter simply toggle whether indexing does or doesn't occur 
> instead of also trying to configure the indexer at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic

Reply via email to