Did you run the Grep.main using bin/nutch? That should do the trick:

Yes, I used bin/nutch see stack attached.
In and out are directories of files. Re is the regex. Group is the optional group within the regex to select when mapping.

Note that you should also define "mapred.job.tracker" in nutch-site.xml with the host:port of your job tracker. (If you're using ndfs then you should also probably define "fs.default.name".)

The jobtracker was already setted in the nutch-default.xml, I only setted 127.0.0.1:8088 for fs.default.name.

However I notice in JobConf line 73
  String defaultValue = "nutch.jar";
...
get("mapred.jar", defaultValue);

May mapred.jar need to be setted somewhere, Grep.java doesn't set it and it is not in the nutch-default.xml

Thanks!

Stefan



Reply via email to