Hi all, I've been busy lately with a Nutch 1.x setup and I've managed to replicate the crawl script into an Oozie workflow (and HUE for pretty web UI). To make things easy I've used the JavaMain action to execute the classes that the nutch scripts invokes, parametrized as necessary.
One thing that I noticed is that I found configuring the command line arguments a tad cumbersome so: would it be unthinkable to adopt the Hadoop -D configuration.setting convention to set these options? bash scripts could still hide the extra verbosity and preserve the current args, while adding the option to define them in nutch-site.xml or in Oozie under a more practical element. The patch wouldn't be too disruptive, but I don't want to do work that wouldn't be folded into upstream so let me know if such an approach flies in the face of community wide decisions and so on... Best, Edoardo -- A Motto Smile a while, and while you smile another smiles And soon there's miles and miles of smiles And life's worth while because you smile

