[Nutch Wiki] Trivial Update of "NutchDistributedFileSystem" by PaulBaclace

Apache Wiki Thu, 08 Dec 2005 23:28:55 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The following page has been changed by PaulBaclace:
http://wiki.apache.org/nutch/NutchDistributedFileSystem

------------------------------------------------------------------------------
   1. The list of Blocks that make up the file "foo.txt"
   2. The set of Datanodes where each Block can be found
  
- The client examines the first Block in the list, and sees that it is 
available on a single Datanode. Fine. The client contacts that Datanode, and 
provides the BlockID?. The datanode transmits the entire block.
+ The client examines the first Block in the list, and sees that it is 
available on a single Datanode. Fine. The client contacts that Datanode, and 
provides the BlockID. The datanode transmits the entire block.
  
  The client has now successfully read the first BLOCK_SIZE bytes of the file 
"foo.txt". (We imagine BLOCK_SIZE will be around 32MB.) So it is now ready to 
read the second Block. It finds that the Namenode claims two Datanodes hold 
this Block. The client picks one at random and contacts it.
  
@@ -88, +88 @@

  
  A DataNode daemon cass is DataNode. 
  
- FSNamesystem.java handles all the bookkeeping for the NameNode?. It keeps 
track of where all the blocks are, which DataNodes? are available, etc.
+ FSNamesystem.java handles all the bookkeeping for the NameNode. It keeps 
track of where all the blocks are, which DataNodes are available, etc.
  
- FSDirectory.java is used by FSNamesystem and maintains the filesystem state. 
It logs all changes to the critical NDFS state so the NameNode? can go down at 
any time and the most recent change is always preserved. (Eventually, this is 
where we will insert the code to mirror changes to a second backup NameNode?.)
+ FSDirectory.java is used by FSNamesystem and maintains the filesystem state. 
It logs all changes to the critical NDFS state so the NameNode can go down at 
any time and the most recent change is always preserved. (Eventually, this is 
where we will insert the code to mirror changes to a second backup NameNode.)
  
- FSDataset.java is used by the DataNode? to hold a set of Blocks and the 
accompanying byte sets on disk.
+ FSDataset.java is used by the DataNode to hold a set of Blocks and the 
accompanying byte sets on disk.
  
- Block.java and DatanodeInfo?.java are used to track those two objects.
+ Block.java and DatanodeInfo.java are used to track those two objects.
  
- FSResults.java and FSParam.java are used for sending arguments over the 
network. Same with HeartbeatData?.java.
+ FSResults.java and FSParam.java are used for sending arguments over the 
network. Same with HeartbeatData.java.
  
  FSConstants.java holds various important system-wide constant values.
  
@@ -118, +118 @@

  
  = Quick Demo =
  
- On machines A,B,C in nutch config file set:
+ On machines A,B,C in nutch config file use settings like these (the real 
configuration is done with XML files in the conf directory):
  
  fs.default.name = A:9000
  
@@ -128, +128 @@

  
  
  
- On machine A, run: $ nutch namenode 
+ On machine A, run: {{{ $ nutch namenode }}} 
  
- On machine B, run: $ nutch datanode
+ On machine B, run: {{{ $ nutch datanode }}}
  
- On machine C, run: $ nutch datanode 
+ On machine C, run: {{{ $ nutch datanode }}}
  
- You now have an NDFS installation with one NameNode? and two DataNodes?. 
(Note, of course, you don't have to run these on different machines. It's 
enough to use different directories and avoid port conflicts.) DataNodes use 
port 7000 or greater (they probe to find free port to listen on starting from 
7000).
+ You now have an NDFS installation with one NameNode and two DataNodes. (Note, 
of course, you don't have to run these on different machines. It's enough to 
use different directories and avoid port conflicts.) DataNodes use port 7000 or 
greater (they probe to find free port to listen on starting from 7000).
  
  Anywhere, run the client (having fs.default.name = A:9000 in nutch config 
file):
  
  (If you want to find the source, class Test``Client is under src/java not 
src/test; this same class is run by the shell script command ["bin/nutch ndfs"] 
)
- 
+ {{{ 
- $ nutch org.apache.nutch.fs.Test``Client 
+ $ nutch org.apache.nutch.fs.TestClient  
- 
+ }}}
  It will display possible NDFS operations to be performed using this test 
tool.  Use absolute file paths for NDFS. 
  
  So to test basic NDFS operation we can execute:
- 
+ {{{
  $ nutch org.apache.nutch.fs.Test``Client -mkdir /test
  
  $ nutch org.apache.nutch.fs.Test``Client -ls /
@@ -161, +161 @@

  $ nutch org.apache.nutch.fs.Test``Client -mv /test/backup /test/testfile
  
  $ nutch org.apache.nutch.fs.Test``Client -get /test/testfile local_copy
- 
+ }}}
  
  You have just created a directory, listed its contents, copied a file from 
local filesystem into it, listed it again, copied it in NDFS, removed original, 
renamed backup to original name and retrieved a copy from NDFS to local file 
system.
  
  There are also additional commands that allow you to inspect the state of 
NDFS:
- 
+ {{{
  $ nutch org.apache.nutch.fs.Test``Client -report
  
  $ nutch org.apache.nutch.fs.Test``Client -du /
- 
+ }}}
  You might try interesting things like the following:
-  1. Start a NameNode? and one DataNode?
+  1. Start a NameNode and one DataNode
   2. Use the client to create a file
-  3. Bring up a second DataNode?
+  3. Bring up a second DataNode
   4. Wait a few seconds
-  5. Bring down the first DataNode?
+  5. Bring down the first DataNode
   6. Use the client to retrieve the file
  
  The system should have replicated the relevant blocks, making the data still 
available in step 6.
  
- If you want to read/write programmatically, use the API exposed in 
org.apache.nutch.ndfs.NDFSClient
+ If you want to read/write programmatically, use the API exposed in 
{{{org.apache.nutch.ndfs.NDFSClient}}}
  
  = Conclusion =

[Nutch Wiki] Trivial Update of "NutchDistributedFileSystem" by PaulBaclace

Reply via email to