[jira] Commented: (HADOOP-5189) Integration with BookKeeper logging system

Konstantin Shvachko (JIRA) Fri, 10 Apr 2009 14:07:40 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697979#action_12697979
 ]


Konstantin Shvachko commented on HADOOP-5189:
---------------------------------------------

# I've got a compile error:
{code}
[javac] 
hadoop/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BackupStorage.java:29: 
cannot find symbol
[javac] symbol  : class EditLogFileInputStream
[javac] location: class org.apache.hadoop.hdfs.server.namenode.FSEditLog
[javac] import 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.EditLogFileInputStream;
{code}
# It is better to create a separate jira for factoring out 
{{EditLogFileOutputStream}} and {{EditLogFileInputStream}} from {{FSEditLog}}. 
This makes sense whether BookKeeper or not, and it will help not to obscure 
changes you really do to the code.
# Why do you need a new method {{setStorageDirectories()}} with a Boolean 
parameter, which is not used anywhere inside.
# We will need some automation, which will add zookeeper and bookkeeper jars to 
the project and synchronize them with new releases.
Can it be done with ivy?
# I agree that edits input part of the code is not generalized for input 
streams other than EditLogFileInputStream. This is because there were no 
alternatives yet. We should work on it.

The drawback of the approach you implement, besides that it requires separate 
image and edits directories, which you mention, is that you do not have a way 
to retrieve the latest checkpoint time from the BookKeeper. This is critical 
for choosing the latest version of the journal, and you can only get the latest 
checkpoint time from the local file (StorageDirectory) that corresponds to the 
stream. The StorageDirectory may be out of sync with the real state of the 
BookKeeper data.

Suppose that you use one file output stream and one BKOutputStream.
Suppose the bookKeeper output stream dies, the name-node keeps writing to the 
file output stream for another hour or so, and then gets restarted.
If name-node configured to read from the bookKeeper input stream, then it will 
get an outdated state of the namespace, because the current state is in the 
local file not in the BK.

In general I am very glad that this is moving in the right direction and we 
will eventually have a framework which will allow to plug in different logging 
systems and intermix them if necessary.

> Integration with BookKeeper logging system
> ------------------------------------------
>
>                 Key: HADOOP-5189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5189
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: create.png, HADOOP-5189-trunk-preview.patch, 
> HADOOP-5189-trunk-preview.patch, HADOOP-5189.patch, HADOOP-5189.patch
>
>
> BookKeeper is a system to reliably log streams of records 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a 
> natural target for such a system for being the metadata repository of the 
> entire file system for HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5189) Integration with BookKeeper logging system

Reply via email to