Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by Gal Nitzan:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
  
  ==== How to start working with MapReduce? ====
  
- edit conf/nutch-site.xml
+   edit conf/nutch-site.xml
  
    <property>
      <name>fs.default.name</name>
@@ -211, +211 @@

    </property>
  
  
- edit conf/mapred-default.xml
+   edit conf/mapred-default.xml
    <property>
      <name>mapred.map.tasks</name>
      <value>4</value>
@@ -225, +225 @@

      <description>define mapred.reduce tasks to be number of slave 
hosts</description>
    </property>
  
- create a file with slave host names
+   create a file with slave host names
  
- {{{
+   {{{
- % echo localhost >> ~/.slaves
+   % echo localhost >> ~/.slaves
- % echo somemachin >> ~/.slaves}}}
+   % echo somemachin >> ~/.slaves}}}
  
- start all ndfs & mapred daemons
+   start all ndfs & mapred daemons
- {{{
+   {{{
- % bin/start-all.sh
+   % bin/start-all.sh
- }}}
+   }}}
  
- create a directory with seed list file
+   create a directory with seed list file
- {{{
+   {{{
- % mkdir seeds
+   % mkdir seeds
- % echo http://www.cnn/com/ > seeds/urls
+   % echo http://www.cnn/com/ > seeds/urls
- }}}
+   }}}
  
- put seed directory in ndfs
+   put seed directory in ndfs
- {{{
+   {{{
- % bin/nutch ndfs -put seeds seeds
+   % bin/nutch ndfs -put seeds seeds
- }}}
+   }}}
  
- crawl a bit
+   crawl a bit
- {{{
+   {{{
- % bin/nutch crawl seeds -depth 3
+   % bin/nutch crawl seeds -depth 3
- }}}
+   }}}
  
- monitor things from adminstrative interface
+   monitor things from adminstrative interface
- open browser and enter your masterHost:7845
+   open browser and enter your masterHost:7845
  
+ ==== How to send commands to NDFS? ====
+ 
+   list files in the root of NDFS
+   {{{
+   [EMAIL PROTECTED] mapred]# bin/nutch ndfs -ls /
+   050927 160948 parsing file:/mapred/conf/nutch-default.xml
+   050927 160948 parsing file:/mapred/conf/nutch-site.xml
+   050927 160948 No FS indicated, using default:localhost:8009
+   050927 160948 Client connection to 127.0.0.1:8009: starting
+   Found 3 items
+   /user/root/crawl-20050927142856 <dir>
+   /user/root/crawl-20050927144626 <dir>
+   /user/root/seeds        <dir>
+   }}}
+ 
+   remove a directory from NDFS
+   {{{
+   [EMAIL PROTECTED] mapred]# bin/nutch ndfs -rm 
/user/root/crawl-20050927144626
+   050927 161025 parsing file:/mapred/conf/nutch-default.xml
+   050927 161025 parsing file:/mapred/conf/nutch-site.xml
+   050927 161025 No FS indicated, using default:localhost:8009
+   050927 161025 Client connection to 127.0.0.1:8009: starting
+   Deleted /user/root/crawl-20050927144626
+   }}}
  
  === Searching ===
  

Reply via email to