I wonder if the name "crawl" implies that the command is sort of standard 
command, and all you would need?  After all, if I where to sit down with a 
"crawler", it seems very logical that "crawl" would be how you run it!  I like 
the simplicity of crawl from a "getting started" approach.  I agree though that 
I know I used it as a short cut...  I didn't want to learn all the lower level 
concepts, I just wanted to crawl a couple URLs and toss them into Solr.  
"crawl" and the example code did great!

Maybe instead of having "crawl" be a core part of running Nutch, instead it's 
"run-example-crawl.sh" and in the Wiki it's caveated that you should then look 
inside it and learn all the various steps.  

Eric


On Aug 23, 2011, at 6:50 AM, Markus Jelsma wrote:

> What kind of shell script did you have in mind? The wiki already provides 
> some 
> useful scripts. The tutorials on Nutch also show commands that can be used in 
> custom scripts.
> 
> Is an immediate crawl-with-one-command a desired feature? Provided as Java 
> code or shell script?
> 
> On Tuesday 23 August 2011 10:12:57 Julien Nioche wrote:
>> +1 let's replace it with a shell script instead.
>> 
>> On 22 August 2011 21:56, Markus Jelsma <markus.jel...@openindex.io> wrote:
>>> Hi,
>>> 
>>> The crawl command seems to add a lot of confusion. It hides the entire
>>> crawl
>>> cycle logic from new users, leading to questions, lack of understanding
>>> of basic Nutch concepts, unsupported switches of the jobs it executes,
>>> more problems etc. I am quite an opponent of the crawl command and would
>>> also not
>>> recommend it to anyone including new users. A running Nutch almost always
>>> requires some scripting here and there, cron jobs, locks etc.
>>> 
>>> I propose (most likely a challenging statement) to deprecate the crawl
>>> command
>>> in 1.4.
>>> 
>>> Users, developers, please comment.
>>> 
>>> Thanks
> 
> -- 
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350

-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.









Reply via email to