[ 
https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Telloli updated HADOOP-5189:
---------------------------------

    Attachment: HADOOP-5189.patch

I'm posting a new patch for the integration of BookKeeper with HDFS. 

In this patch the logSync() method is exactly the same as the original file 
based logging. Additionally, it does not implement any abstract class for 
logging, apart from the two EditLogInput/OutputStream classes, as requested by 
Konstantin (in the following I'll just use InputStream, but I'll refer to 
both). 

Here's some detail about the patch:

1. the current implementation does not allow other types of logging; 
specifically version .19 does not allow any other EditLogInputStream apart from 
EditLogFileInputStream, that is, each time a EditLogInputStream is needed, a 
EditLogFileInputStream is instantiated. In the patch I add configuration values 
to enable Bookkeper logging and I allow the user to switch between different 
logging types by using a configuration property in hadoop-site.xml 


2. As Konstantin suggested some time ago, in the current patch I started by 
implementing only the two abstract classes above, but at the end I had to 
modify more classes. In particular I modify FSEditLog.java, 
SecondaryNamenode.java and FSImage.java. Although the modifications are mainly 
related to issue 1, there's an additional confusion between the semantics of 
"open" and "create" since, in the case of files, the two operations have strong 
similarities. This doesn't hold for BookKeeper in some cases (mostly related to 
the CreateEditLogFile() method), my code needs to branch to avoid some unwanted 
creation of new ledgers. 

Even with this, the patch is not yet complete, due to the following issues: 

3. the current .19 implementation does not yet implement support for multiple 
concurrent logging systems. This is another implementation problem which should 
be fixed, but I'm not sure how easily. As Ben said, HDFS is heavily based on 
file and uses storage directories to store the image of the file system and the 
edits in these directories. I think this then turns into a design problem, 
because it's not easy to decouple the file system from the edits file, since 
they both live in the same directory. 

4. Another drawback related to 3 is that the Namenode, to properly work, needs 
some files like edits and edits new even if the logging system is not using 
file-based logging. In the patch, even if I'm using Bookkeeper to store edits, 
I still need to have the edits and edits.new files. I currently use them to 
store some small information about ledgers IDs but this will change soon in 
favor of ZooKeeper. 

I'm not sure how to fix the above issues, in particular I'm worried that a good 
solution would need to rethink the Storage Directories and the FSImage as 
they're currently implemented. 


> Integration with BookKeeper logging system
> ------------------------------------------
>
>                 Key: HADOOP-5189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5189
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: create.png, HADOOP-5189.patch, HADOOP-5189.patch
>
>
> BookKeeper is a system to reliably log streams of records 
> (https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a 
> natural target for such a system for being the metadata repository of the 
> entire file system for HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to