[
https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121432#comment-14121432
]
Julien Nioche commented on NUTCH-1832:
--------------------------------------
bq. If I'm behing honest, I feel a bit left out on the discussion to change the
behavior of Nutch from not requiring indexing, in the first place. To me this
was a significant capability for Nutch - to simply use Nutch to pull down
content (like a smarter wget) initially as a test. It took several people who
we were working with for over 2 days multiple hours to even get Solr installed
(some on Windows laptops; some on Macs, etc.) and imagine my face when I was
telling them several times "You don't need Solr to work with Nutch" only to
find out that I was totally wrong about that since the behavior had changed.
You haven't been left out on any discussion since there has been *no change* in
behavior : indexing (be it with the old-indexing mechanism or delegating to
SOLR) has *always* been the default behaviour when using the all-in-one crawl
command - which the crawl script replaces. See
[https://github.com/apache/nutch/blob/5943b9f1d6f17c0c95ca169ac67b7da379d2bef8/src/java/org/apache/nutch/crawl/Crawl.java],
that's Doug's code from 2005. The logic hasn't changed since and the crawl
script just replaces the crawl class but behaves in the same way. (This does
not mean that indexing is strictly necessary as nothing prevents users from
using the nutch commands directly in any way they wanted).
bq. I appreciate the survey that you did. With ~40 respondents however, I
disagree that you've surveyed the majority of users of Nutch. I honestly had 40
people using it the past few days and I can pretty much state that none of them
replied to your original survey.
Please ask them to reply to the survey then, it will certainly make it more
representative . Or provide your own statistics and to tell us what the
majority of Nutch users do.
bq. We can go back and debate the original change that required indexing (which
wasn't required before)
That's simply incorrect.
Now the only change to the existing behaviour is the one you introduced with
this commit by removing 'indexer-solr' from 'plugin.includes'. Can you please
fix this? Thanks
PS: I find your tone has been quite aggressive recently (e.g. discussion on
versioning). Any particular reason?
> Make Nutch work without an indexer
> ----------------------------------
>
> Key: NUTCH-1832
> URL: https://issues.apache.org/jira/browse/NUTCH-1832
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.9
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.10
>
> Attachments: NUTCH-1832.Mattmann.090314.patch.2.txt,
> NUTCH-1832.Mattmann.090314.patch.txt
>
>
> Nutch used to work out of the box, without requiring an indexing backend. As
> of 1.9, that's not the case anymore (it's possible even before that). Thanks
> to [~markus17] for pointing out that this is due to the indexing-solr plugin
> being enabled by default. We should disable it by default, so that the
> regression is removed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)