[jira] [Commented] (NUTCH-1832) Make Nutch work without an indexer

Julien Nioche (JIRA) Thu, 04 Sep 2014 07:57:05 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121432#comment-14121432
 ]


Julien Nioche commented on NUTCH-1832:
--------------------------------------

bq. If I'm behing honest, I feel a bit left out on the discussion to change the 
behavior of Nutch from not requiring indexing, in the first place. To me this 
was a significant capability for Nutch - to simply use Nutch to pull down 
content (like a smarter wget) initially as a test. It took several people who 
we were working with for over 2 days multiple hours to even get Solr installed 
(some on Windows laptops; some on Macs, etc.) and imagine my face when I was 
telling them several times "You don't need Solr to work with Nutch" only to 
find out that I was totally wrong about that since the behavior had changed.

You haven't been left out on any discussion since there has been *no change* in 
behavior : indexing (be it with the old-indexing mechanism or delegating to 
SOLR) has *always* been the default behaviour when using the all-in-one crawl 
command - which the crawl script replaces. See 
[https://github.com/apache/nutch/blob/5943b9f1d6f17c0c95ca169ac67b7da379d2bef8/src/java/org/apache/nutch/crawl/Crawl.java],
 that's Doug's code from 2005. The logic hasn't changed since and the crawl 
script just replaces the crawl class but behaves in the same way. (This does 
not mean that indexing is strictly necessary as nothing prevents users from 
using the nutch commands directly in any way they wanted).

bq. I appreciate the survey that you did. With ~40 respondents however, I 
disagree that you've surveyed the majority of users of Nutch. I honestly had 40 
people using it the past few days and I can pretty much state that none of them 
replied to your original survey.

Please ask them to reply to the survey then, it will certainly make it more 
representative . Or provide your own statistics and to tell us what the 
majority of Nutch users do.

bq. We can go back and debate the original change that required indexing (which 
wasn't required before)

That's simply incorrect. 

Now the only change to the existing behaviour is the one you introduced with 
this commit by removing  'indexer-solr' from 'plugin.includes'. Can you please 
fix this? Thanks

PS: I find your tone has been quite aggressive recently (e.g. discussion on 
versioning). Any particular reason? 





> Make Nutch work without an indexer
> ----------------------------------
>
>                 Key: NUTCH-1832
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1832
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.10
>
>         Attachments: NUTCH-1832.Mattmann.090314.patch.2.txt, 
> NUTCH-1832.Mattmann.090314.patch.txt
>
>
> Nutch used to work out of the box, without requiring an indexing backend. As 
> of 1.9, that's not the case anymore (it's possible even before that). Thanks 
> to [~markus17] for pointing out that this is due to the indexing-solr plugin 
> being enabled by default. We should disable it by default, so that the 
> regression is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NUTCH-1832) Make Nutch work without an indexer

Reply via email to