[
https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089405#comment-13089405
]
Andrzej Bialecki commented on NUTCH-1087:
------------------------------------------
IIRC we had this discussion in the past... It's true that we already rely on
Bash to do anything useful, no matter whether it's on Windows or on a *nix-like
OS. And it's true that the crawl command has been a constant source of
confusion over the years. The crawl application also suffered from some subtle
bugs, especially when running in local mode (e.g. the PluginRepository leaks).
But the argument about maintenance costs is IMHO moot - you have to maintain a
shell script, too, so it's no different from maintaining a Java class. Where it
differs, I think, is that moving the crawl cycle logic to a shell script now
raises the bar for Java developers who are not familiar with Bash scripting - a
robust crawl script is not easy to follow, as it needs to handle error
conditions and manage input/output resources on HDFS. On the other hand it's
easier for system admins to tweak a script rather than tweaking a Java code...
so I guess it's also a question of who's the audience for this functionality.
I'm +0 for removing Crawl and replacing it with a script, IMHO it doesn't
change the picture in any significant way.
> Deprecate crawl command and replace with example script
> -------------------------------------------------------
>
> Key: NUTCH-1087
> URL: https://issues.apache.org/jira/browse/NUTCH-1087
> Project: Nutch
> Issue Type: Task
> Affects Versions: 1.4
> Reporter: Markus Jelsma
> Priority: Minor
> Fix For: 1.4
>
>
> * remove the crawl command
> * add basic crawl shell script
> See thread:
> http://www.mail-archive.com/[email protected]/msg03848.html
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira