Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by Gal Nitzan: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ ==== How to start working with MapReduce? ==== - edit conf/nutch-site.xml + edit conf/nutch-site.xml <property> <name>fs.default.name</name> @@ -211, +211 @@ </property> - edit conf/mapred-default.xml + edit conf/mapred-default.xml <property> <name>mapred.map.tasks</name> <value>4</value> @@ -225, +225 @@ <description>define mapred.reduce tasks to be number of slave hosts</description> </property> - create a file with slave host names + create a file with slave host names - {{{ + {{{ - % echo localhost >> ~/.slaves + % echo localhost >> ~/.slaves - % echo somemachin >> ~/.slaves}}} + % echo somemachin >> ~/.slaves}}} - start all ndfs & mapred daemons + start all ndfs & mapred daemons - {{{ + {{{ - % bin/start-all.sh + % bin/start-all.sh - }}} + }}} - create a directory with seed list file + create a directory with seed list file - {{{ + {{{ - % mkdir seeds + % mkdir seeds - % echo http://www.cnn/com/ > seeds/urls + % echo http://www.cnn/com/ > seeds/urls - }}} + }}} - put seed directory in ndfs + put seed directory in ndfs - {{{ + {{{ - % bin/nutch ndfs -put seeds seeds + % bin/nutch ndfs -put seeds seeds - }}} + }}} - crawl a bit + crawl a bit - {{{ + {{{ - % bin/nutch crawl seeds -depth 3 + % bin/nutch crawl seeds -depth 3 - }}} + }}} - monitor things from adminstrative interface + monitor things from adminstrative interface - open browser and enter your masterHost:7845 + open browser and enter your masterHost:7845 + ==== How to send commands to NDFS? ==== + + list files in the root of NDFS + {{{ + [EMAIL PROTECTED] mapred]# bin/nutch ndfs -ls / + 050927 160948 parsing file:/mapred/conf/nutch-default.xml + 050927 160948 parsing file:/mapred/conf/nutch-site.xml + 050927 160948 No FS indicated, using default:localhost:8009 + 050927 160948 Client connection to 127.0.0.1:8009: starting + Found 3 items + /user/root/crawl-20050927142856 <dir> + /user/root/crawl-20050927144626 <dir> + /user/root/seeds <dir> + }}} + + remove a directory from NDFS + {{{ + [EMAIL PROTECTED] mapred]# bin/nutch ndfs -rm /user/root/crawl-20050927144626 + 050927 161025 parsing file:/mapred/conf/nutch-default.xml + 050927 161025 parsing file:/mapred/conf/nutch-site.xml + 050927 161025 No FS indicated, using default:localhost:8009 + 050927 161025 Client connection to 127.0.0.1:8009: starting + Deleted /user/root/crawl-20050927144626 + }}} === Searching ===
