[
https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luca Telloli updated HADOOP-5189:
---------------------------------
Attachment: HADOOP-5189.patch
I'm posting a new patch for the integration of BookKeeper with HDFS.
In this patch the logSync() method is exactly the same as the original file
based logging. Additionally, it does not implement any abstract class for
logging, apart from the two EditLogInput/OutputStream classes, as requested by
Konstantin (in the following I'll just use InputStream, but I'll refer to
both).
Here's some detail about the patch:
1. the current implementation does not allow other types of logging;
specifically version .19 does not allow any other EditLogInputStream apart from
EditLogFileInputStream, that is, each time a EditLogInputStream is needed, a
EditLogFileInputStream is instantiated. In the patch I add configuration values
to enable Bookkeper logging and I allow the user to switch between different
logging types by using a configuration property in hadoop-site.xml
2. As Konstantin suggested some time ago, in the current patch I started by
implementing only the two abstract classes above, but at the end I had to
modify more classes. In particular I modify FSEditLog.java,
SecondaryNamenode.java and FSImage.java. Although the modifications are mainly
related to issue 1, there's an additional confusion between the semantics of
"open" and "create" since, in the case of files, the two operations have strong
similarities. This doesn't hold for BookKeeper in some cases (mostly related to
the CreateEditLogFile() method), my code needs to branch to avoid some unwanted
creation of new ledgers.
Even with this, the patch is not yet complete, due to the following issues:
3. the current .19 implementation does not yet implement support for multiple
concurrent logging systems. This is another implementation problem which should
be fixed, but I'm not sure how easily. As Ben said, HDFS is heavily based on
file and uses storage directories to store the image of the file system and the
edits in these directories. I think this then turns into a design problem,
because it's not easy to decouple the file system from the edits file, since
they both live in the same directory.
4. Another drawback related to 3 is that the Namenode, to properly work, needs
some files like edits and edits new even if the logging system is not using
file-based logging. In the patch, even if I'm using Bookkeeper to store edits,
I still need to have the edits and edits.new files. I currently use them to
store some small information about ledgers IDs but this will change soon in
favor of ZooKeeper.
I'm not sure how to fix the above issues, in particular I'm worried that a good
solution would need to rethink the Storage Directories and the FSImage as
they're currently implemented.
> Integration with BookKeeper logging system
> ------------------------------------------
>
> Key: HADOOP-5189
> URL: https://issues.apache.org/jira/browse/HADOOP-5189
> Project: Hadoop Core
> Issue Type: New Feature
> Affects Versions: 0.19.0
> Reporter: Luca Telloli
> Attachments: create.png, HADOOP-5189.patch, HADOOP-5189.patch
>
>
> BookKeeper is a system to reliably log streams of records
> (https://issues.apache.org/jira/browse/ZOOKEEPER-276). The NameNode is a
> natural target for such a system for being the metadata repository of the
> entire file system for HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.