[
https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luca Telloli updated HADOOP-5189:
---------------------------------
Attachment: HADOOP-5189.patch
Integrating Bookkeeper into HDFS
--------------------------------
INTRO
------
BookKeeper is a system to reliably log streams of records
(https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a
natural target for such a system for being the metadata repository of the
entire file system for HDFS.
The NameNode works by logging any modification to the file system on a separate
stream and periodically merge the stream with the latest image of the file
system. The standard version of HDFS only supports file-based logging on any
number of devices. This patch provides BookKeeper logging.
The advantages of BookKeeper-logging over file-logging are:
* higher throughput performance in large deployments
* higher availability though externally recoverable log files (ledgers)
This patch provides:
* a new abstraction for logging (also filed as
https://issues.apache.org/jira/browse/HADOOP-5188)
* an integration between Hadoop's HDFS and BookKeeper
HOW TO APPLY THE PATCH
-------------------------
The patch is available against Hadoop release 0.19
(http://svn.apache.org/repos/asf/hadoop/core/tags/release-0.19.0/). This patch
includes https://issues.apache.org/jira/browse/HADOOP-5188 (Modifications to
enable multiple types of logging)
The patch does not support file-based logging, nor it offers a way to switch
from one system to the other, so please apply with caution. The patch does not
support HDFS deployment older than 0.19 and has been tested *only* over
pre-formatted file-systems; although it should work without many troubles, we
suggest not to apply this patch to production systems.
To compile, just run ant. To compile and run properly you will need to add the
zookeeper and bookkeeper jar to you hadoop/lib directory. Please refer to
http://hadoop.apache.org/zookeeper/ for information on how to get, compile and
run Zookeeper and Bookkeeper.
HOW TO USE THE PATCHED HDFS
----------------------------
One you patch Hadoop there will be some new properties to configure in
hadoop-site.xml; a description follows:
<property>
<!--
Specifies the type of logging to use. At the moment, only BKEditLogThreadBuf is
provided.
-->
<name>hdfs.editlog</name>
<value>BKEditLogThreadBuf</value>
</property>
<property>
<!--
Specifies the number of bookies to use. Default is 3.
-->
<name>bklog.bookies.total</name>
<value>3</value>
</property>
<property>
<!--
Specifies the size of the quorum. Default is 2.
-->
<name>bklog.bookies.quorumsize</name>
<value>2</value>
</property>
<property>
<!--
Specifies the logging mode. A string in {"verifiable", "generic"}. For more
information on this see TODO
-->
<name>bklog.bookies.qmode</name>
<value>verifiable</value>
</property>
<property>
<!--
Specifies the ZooKeeper server containing info about bookies
-->
<name>bklog.zookeeper.servers</name>
<value>127.0.0.1</value>
</property>
Once HDFS is configured, the subsequent steps are:
* start ZooKeeper
* start the bookies
* initialise ZooKeeper with the bookies' host (ZOOKEEPER-301
[https://issues.apache.org/jira/browse/ZOOKEEPER-301] provides a util class
for this)
* start HDFS
* ready to GO!
PERFORMANCE
-----------
We tested our prototype against vanilla HDFS using a benchmark provided by the
Hadoop folks and
available in the Hadoop distribution as
org.apache.hadoop.hdfs.NNThroughputBenchmark. BookKeeper
logging was configured to use 3 bookies with a quorum of 2 in the verifiable
mode. File-logging was
configured to write on a single device (FS) or two devices (FS-NFS, one of them
being mounted through NFS).
The machines were dual core 4G with 2 300G SATA drives.
The variables involved are the number of concurrent threads and the number of
operations to log. We tested logging 400k ops of type create, and the
throughput (measured in Ops/s) is depicted in the attached graph.
> Integration with BookKeeper logging system
> ------------------------------------------
>
> Key: HADOOP-5189
> URL: https://issues.apache.org/jira/browse/HADOOP-5189
> Project: Hadoop Core
> Issue Type: New Feature
> Affects Versions: 0.19.0
> Reporter: Luca Telloli
> Attachments: HADOOP-5189.patch
>
>
> BookKeeper is a system to reliably log streams of records
> (https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a
> natural target for such a system for being the metadata repository of the
> entire file system for HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.