[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

Andrzej Bialecki (JIRA) Tue, 23 Aug 2011 04:49:24 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089405#comment-13089405
 ]


Andrzej Bialecki  commented on NUTCH-1087:
------------------------------------------


IIRC we had this discussion in the past... It's true that we already rely on 
Bash to do anything useful, no matter whether it's on Windows or on a *nix-like 
OS. And it's true that the crawl command has been a constant source of 
confusion over the years. The crawl application also suffered from some subtle 
bugs, especially when running in local mode (e.g. the PluginRepository leaks).

But the argument about maintenance costs is IMHO moot - you have to maintain a 
shell script, too, so it's no different from maintaining a Java class. Where it 
differs, I think, is that moving the crawl cycle logic to a shell script now 
raises the bar for Java developers who are not familiar with Bash scripting - a 
robust crawl script is not easy to follow, as it needs to handle error 
conditions and manage input/output resources on HDFS. On the other hand it's 
easier for system admins to tweak a script rather than tweaking a Java code... 
so I guess it's also a question of who's the audience for this functionality.

I'm +0 for removing Crawl and replacing it with a script, IMHO it doesn't 
change the picture in any significant way.


> Deprecate crawl command and replace with example script
> -------------------------------------------------------
>
>                 Key: NUTCH-1087
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1087
>             Project: Nutch
>          Issue Type: Task
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/[email protected]/msg03848.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

Reply via email to