[
https://issues.apache.org/jira/browse/HADOOP-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678564#action_12678564
]
Konstantin Shvachko commented on HADOOP-5188:
---------------------------------------------
I think you are trying to substitute EditLogOutputStream abstraction with an
EditLog abstraction. Will try to explain:
- FSImage object deals (or supposed to deal) with everything related to the
file system persistent image.
- FSEditLog should be dealing with everything related to journaling.
- There are different ways of journaling and this should be reflected by an
abstract EditLogOutputStream class.
Your approach will lead to that FSImage will have an array of EditLog(s). And
you will have to introduce {{FSImage.logSync()}} method on it so that it would
loop over all EditLogs and call their respective {{EditLog.logSync()}} methods.
But this is exactly what current {{EditLog.logSync()}} method does. It loops
through the EditLogOutputStreams and calls flushAndSync() on them. The same
with other operations: logEdit(), processIOError().
So the idea is that EditLog should combine common logic for all journaling
streams (logging types). The specifics of journaling should be contained within
implementations of EditLogOutputStream.
I agree with Ben - FSEditLog was originally written for file based journals and
still contains code specific to this type. And it may be optimized.
I can see that waiting for the whole batch of edits to complete makes the
bookKeeper stream less efficient.
But that does not mean that FSEditLog should be overloaded; it just means that
one method logSync() should be generalized to allow efficient implementation of
BK streams *as well as* other (file and backup) streams.
So the proposal. Lets put an if statement in logSync() for now, which checks
whether all streams are bookKeeper streams and then if it is it does not go
into synchronized sections in logSync() (avoids waiting) or alternatively calls
a BK specific method.
I say this because if the name-node uses BK logging together with other logging
types then logging will go with the speed of the slowest journal. So the only
case when BK can benefit from optimized logSync() is when there are now other
than BK types of streams.
Hope that makes sense.
> Modifications to enable multiple types of logging
> --------------------------------------------------
>
> Key: HADOOP-5188
> URL: https://issues.apache.org/jira/browse/HADOOP-5188
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.19.0
> Reporter: Luca Telloli
> Attachments: HADOOP-5188.patch
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.