[
https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121441#comment-14121441
]
Chris A. Mattmann edited comment on NUTCH-1832 at 9/4/14 3:05 PM:
------------------------------------------------------------------
bq. You haven't been left out on any discussion since there has been no change
in behavior : indexing (be it with the old-indexing mechanism or delegating to
SOLR) has always been the default behaviour when using the all-in-one crawl
command - which the crawl script replaces. See
https://github.com/apache/nutch/blob/5943b9f1d6f17c0c95ca169ac67b7da379d2bef8/src/java/org/apache/nutch/crawl/Crawl.java,
that's Doug's code from 2005. The logic hasn't changed since and the crawl
script just replaces the crawl class but behaves in the same way. (This does
not mean that indexing is strictly necessary as nothing prevents users from
using the nutch commands directly in any way they wanted).
Pointing at Doug's Crawl.java class isn't what I was stating. Here is what I
was stating.
Old use case:
1. Download Nutch out of the Box. Don't need Solr. I can simply "start Nutch"
and its fetching, and if there was no Solr URL specified, it would go through a
full crawl (this was when ./bin/nutch crawl did something - which it doesn't
anymore - it tells you to use the crawl script). That *is* a change.
New use case:
1. Download Nutch out of the Box. Can't run Nutch crawling (with the
./bin/crawl command, the only option since ./bin/nutch crawl doesn't exist
anymore). *That is a change*
bq. Please ask them to reply to the survey then, it will certainly make it more
representative . Or provide your own statistics and to tell us what the
majority of Nutch users do.
You stated that the majority of Nutch users crawl *and* index. I simply stated
that I don't think Nutch only has 40 users. In fact, like I said, I know it
doesn't :) I started by saying I applaud you and the work you did on it.
bq. Now the only change to the existing behaviour is the one you introduced
with this commit by removing 'indexer-solr' from 'plugin.includes'. Can you
please fix this? Thanks
No, not until someone addresses my concern (which you haven't) about the change
in behavior. Thanks.
bq. PS: I find your tone has been quite aggressive recently (e.g. discussion
on versioning). Any particular reason?
Not really - in particular you seem to be debating everything I suggest. So,
please continue to do so. I'm happy to debate back.
was (Author: chrismattmann):
{blockquote}
You haven't been left out on any discussion since there has been no change in
behavior : indexing (be it with the old-indexing mechanism or delegating to
SOLR) has always been the default behaviour when using the all-in-one crawl
command - which the crawl script replaces. See
https://github.com/apache/nutch/blob/5943b9f1d6f17c0c95ca169ac67b7da379d2bef8/src/java/org/apache/nutch/crawl/Crawl.java,
that's Doug's code from 2005. The logic hasn't changed since and the crawl
script just replaces the crawl class but behaves in the same way. (This does
not mean that indexing is strictly necessary as nothing prevents users from
using the nutch commands directly in any way they wanted).
{blockquote}
Pointing at Doug's Crawl.java class isn't what I was stating. Here is what I
was stating.
Old use case:
1. Download Nutch out of the Box. Don't need Solr. I can simply "start Nutch"
and its fetching, and if there was no Solr URL specified, it would go through a
full crawl (this was when ./bin/nutch crawl did something - which it doesn't
anymore - it tells you to use the crawl script). That *is* a change.
New use case:
1. Download Nutch out of the Box. Can't run Nutch crawling (with the
./bin/crawl command, the only option since ./bin/nutch crawl doesn't exist
anymore). *That is a change*
{blockquote}
Please ask them to reply to the survey then, it will certainly make it more
representative . Or provide your own statistics and to tell us what the
majority of Nutch users do.
{blockquote}
You stated that the majority of Nutch users crawl *and* index. I simply stated
that I don't think Nutch only has 40 users. In fact, like I said, I know it
doesn't :) I started by saying I applaud you and the work you did on it.
{blockquote}
Now the only change to the existing behaviour is the one you introduced with
this commit by removing 'indexer-solr' from 'plugin.includes'. Can you please
fix this? Thanks
{blockquote}
No, not until someone addresses my concern (which you haven't) about the change
in behavior. Thanks.
{blockquote}
PS: I find your tone has been quite aggressive recently (e.g. discussion on
versioning). Any particular reason?
{blockquote}
Not really - in particular you seem to be debating everything I suggest. So,
please continue to do so. I'm happy to debate back.
> Make Nutch work without an indexer
> ----------------------------------
>
> Key: NUTCH-1832
> URL: https://issues.apache.org/jira/browse/NUTCH-1832
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.10
>
> Attachments: NUTCH-1832.Mattmann.090314.patch.2.txt,
> NUTCH-1832.Mattmann.090314.patch.txt
>
>
> Nutch used to work out of the box, without requiring an indexing backend. As
> of 1.9, that's not the case anymore (it's possible even before that). Thanks
> to [~markus17] for pointing out that this is due to the indexing-solr plugin
> being enabled by default. We should disable it by default, so that the
> regression is removed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)