[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Sanjay Radia (JIRA) Mon, 05 Nov 2007 15:57:13 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sanjay Radia updated HADOOP-1989:
---------------------------------

    Attachment: SimulatedStoragePatchSubmit5.txt

The attached patch addresses Konstantine's feedback on the previous patch.
It also add a new class DataNodeCluster that allows one to run a DataNode 
cluster in a single address space (the
name node can be in a separate address space). This class allows one to run 
multiple instances of the simulated
data node in a single VM; this is useful for benchmarking with a real Name node 
and a large number of 
simulated data nodes. The hadoop command has been modified to allow one to run 
this as:
      bin/hadoop datanodecluster

> Add support for simulated Data Nodes  - helpful for testing and performance 
> benchmarking of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Sanjay Radia
>            Priority: Minor
>         Attachments: SimulatedStoragePatchSubmit.txt, 
> SimulatedStoragePatchSubmit5.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, 
> protocols) much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a 
> large cluster.
>   - Inject faults for testing (e.g. one can add random faults based 
> probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their 
> sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed 
> set of bytes or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to 
> control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster

Reply via email to