[ 
https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Telloli updated HADOOP-5189:
---------------------------------

    Attachment: HADOOP-5189-trunk-preview.patch

I'm posting a new preview version that addresses two features: 

- Logging on multiple devices 
- Writing IDs on Zookeeper (that is, no longer usage of files to write 
information)

I additionally moved EditLogFileOutputStream and EditLogFileInputStream out of 
the FSEditLog class. 

A sample configuration is the following: 
<property>
            <name>dfs.name.dir</name>
            <value>/tmp/localhdfs</value>
</property>
<property>
            <name>dfs.name.edits.dir</name>
            <value>/tmp/hdfsedits</value>
</property>
<property>
        <name>hdfs.editlog</name>
        <value>FILE,BOOKKEEPER</value>
</property>

NOTE: The hdfs.editlog is a new property that has to be specified for this 
patch to work. 

RUNNING ZOOKEEPER AND BOOKKEEPER EASILY
To run ZooKeeper and BookKeeper in one shot, there' a class in the bookkeeper 
.jar named  org.apache.bookkeeper.util.LocalBookKeeper which can run a 
ZooKeeper along with a user-specified number of BookKeepers.  

An example command is the following: 
java -cp 
lib/log4j-1.2.15.jar:lib/junit-3.8.1.jar:lib/zookeeper-dev.jar:lib/zookeeper-dev-bookkeeper.jar
 org.apache.bookkeeper.util.LocalBookKeeper

LOGGING ON MULTIPLE DEVICES
The initial semantic is very simple, and is the following: 
- when writing an operation, write sequentially to all types of logging 
- when reading operations (during the startup or checkpoint), read from the 
first logging system; at the moment this is the first storage directory, so 
still file-based 

There's no fall-back mechanism implemented yet if the first logging system 
fails (the idea would be to go with the next one and exclude the failed one 
from the array of streams). 

The current loadFSEdits(StorageDirectory) should eventually change to a 
loadFSEdits() where no storage directory is needed. Maybe a 
loadFSEdits(EditLogInputStream) would be even better.

DRAWBACKS
Currently, storage directories can be of three types: IMAGE, EDITS and 
IMAGE_AND_EDITS, with the last one being the default one. With this patch I 
exclude the IMAGE_AND_EDITS type, so user are forced to use the dfs.name.dir 
and dfs.name.edits.dir to specify a directory for IMAGE and a directory for 
EDITS, when using file logging. 

> Integration with BookKeeper logging system
> ------------------------------------------
>
>                 Key: HADOOP-5189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5189
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: create.png, HADOOP-5189-trunk-preview.patch, 
> HADOOP-5189-trunk-preview.patch, HADOOP-5189.patch, HADOOP-5189.patch
>
>
> BookKeeper is a system to reliably log streams of records 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a 
> natural target for such a system for being the metadata repository of the 
> entire file system for HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to