[jira] [Comment Edited] (NUTCH-1832) Make Nutch work without an indexer

Chris A. Mattmann (JIRA) Thu, 04 Sep 2014 08:07:27 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121441#comment-14121441
 ]


Chris A. Mattmann edited comment on NUTCH-1832 at 9/4/14 3:05 PM:
------------------------------------------------------------------

bq. You haven't been left out on any discussion since there has been no change 
in behavior : indexing (be it with the old-indexing mechanism or delegating to 
SOLR) has always been the default behaviour when using the all-in-one crawl 
command - which the crawl script replaces. See 
https://github.com/apache/nutch/blob/5943b9f1d6f17c0c95ca169ac67b7da379d2bef8/src/java/org/apache/nutch/crawl/Crawl.java,
 that's Doug's code from 2005. The logic hasn't changed since and the crawl 
script just replaces the crawl class but behaves in the same way. (This does 
not mean that indexing is strictly necessary as nothing prevents users from 
using the nutch commands directly in any way they wanted).

Pointing at Doug's Crawl.java class isn't what I was stating. Here is what I 
was stating.

Old use case:

1. Download Nutch out of the Box. Don't need Solr. I can simply "start Nutch" 
and its fetching, and if there was no Solr URL specified, it would go through a 
full crawl (this was when ./bin/nutch crawl did something - which it doesn't 
anymore - it tells you to use the crawl script). That *is* a change.

New use case:

1. Download Nutch out of the Box. Can't run Nutch crawling (with the 
./bin/crawl command, the only option since ./bin/nutch crawl doesn't exist 
anymore). *That is a change*

bq. Please ask them to reply to the survey then, it will certainly make it more 
representative . Or provide your own statistics and to tell us what the 
majority of Nutch users do.

You stated that the majority of Nutch users crawl *and* index. I simply stated 
that I don't think Nutch only has 40 users. In fact, like I said, I know it 
doesn't :) I started by saying I applaud you and the work you did on it.

bq. Now the only change to the existing behaviour is the one you introduced 
with this commit by removing 'indexer-solr' from 'plugin.includes'. Can you 
please fix this? Thanks

No, not until someone addresses my concern (which you haven't) about the change 
in behavior. Thanks.

bq.  PS: I find your tone has been quite aggressive recently (e.g. discussion 
on versioning). Any particular reason?

Not really - in particular you seem to be debating everything I suggest. So, 
please continue to do so. I'm happy to debate back.



was (Author: chrismattmann):
{blockquote}
You haven't been left out on any discussion since there has been no change in 
behavior : indexing (be it with the old-indexing mechanism or delegating to 
SOLR) has always been the default behaviour when using the all-in-one crawl 
command - which the crawl script replaces. See 
https://github.com/apache/nutch/blob/5943b9f1d6f17c0c95ca169ac67b7da379d2bef8/src/java/org/apache/nutch/crawl/Crawl.java,
 that's Doug's code from 2005. The logic hasn't changed since and the crawl 
script just replaces the crawl class but behaves in the same way. (This does 
not mean that indexing is strictly necessary as nothing prevents users from 
using the nutch commands directly in any way they wanted).
{blockquote}

Pointing at Doug's Crawl.java class isn't what I was stating. Here is what I 
was stating.

Old use case:

1. Download Nutch out of the Box. Don't need Solr. I can simply "start Nutch" 
and its fetching, and if there was no Solr URL specified, it would go through a 
full crawl (this was when ./bin/nutch crawl did something - which it doesn't 
anymore - it tells you to use the crawl script). That *is* a change.

New use case:

1. Download Nutch out of the Box. Can't run Nutch crawling (with the 
./bin/crawl command, the only option since ./bin/nutch crawl doesn't exist 
anymore). *That is a change*

{blockquote}
Please ask them to reply to the survey then, it will certainly make it more 
representative . Or provide your own statistics and to tell us what the 
majority of Nutch users do.
{blockquote}

You stated that the majority of Nutch users crawl *and* index. I simply stated 
that I don't think Nutch only has 40 users. In fact, like I said, I know it 
doesn't :) I started by saying I applaud you and the work you did on it.

{blockquote}
Now the only change to the existing behaviour is the one you introduced with 
this commit by removing 'indexer-solr' from 'plugin.includes'. Can you please 
fix this? Thanks
{blockquote}

No, not until someone addresses my concern (which you haven't) about the change 
in behavior. Thanks.

{blockquote}
PS: I find your tone has been quite aggressive recently (e.g. discussion on 
versioning). Any particular reason?
{blockquote}

Not really - in particular you seem to be debating everything I suggest. So, 
please continue to do so. I'm happy to debate back.


> Make Nutch work without an indexer
> ----------------------------------
>
>                 Key: NUTCH-1832
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1832
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.10
>
>         Attachments: NUTCH-1832.Mattmann.090314.patch.2.txt, 
> NUTCH-1832.Mattmann.090314.patch.txt
>
>
> Nutch used to work out of the box, without requiring an indexing backend. As 
> of 1.9, that's not the case anymore (it's possible even before that). Thanks 
> to [~markus17] for pointing out that this is due to the indexing-solr plugin 
> being enabled by default. We should disable it by default, so that the 
> regression is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (NUTCH-1832) Make Nutch work without an indexer

Reply via email to