[jira] Commented: (HADOOP-5188) Modifications to enable multiple types of logging

Konstantin Shvachko (JIRA) Tue, 03 Mar 2009 19:19:23 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678564#action_12678564
 ]


Konstantin Shvachko commented on HADOOP-5188:
---------------------------------------------

I think you are trying to substitute EditLogOutputStream abstraction with an 
EditLog abstraction. Will try to explain:
- FSImage object deals (or supposed to deal) with everything related to the 
file system persistent image.
- FSEditLog should be dealing with everything related to journaling.
- There are different ways of journaling and this should be reflected by an 
abstract EditLogOutputStream class.

Your approach will lead to that FSImage will have an array of EditLog(s). And 
you will have to introduce {{FSImage.logSync()}} method on it so that it would 
loop over all EditLogs and call their respective {{EditLog.logSync()}} methods. 
But this is exactly what current {{EditLog.logSync()}} method does. It loops 
through the EditLogOutputStreams and calls flushAndSync() on them. The same 
with other operations: logEdit(), processIOError().

So the idea is that EditLog should combine common logic for all journaling 
streams (logging types). The specifics of journaling should be contained within 
implementations of  EditLogOutputStream.

I agree with Ben - FSEditLog was originally written for file based journals and 
still contains code specific to this type. And it may be optimized.
I can see that waiting for the whole batch of edits to complete makes the 
bookKeeper stream less efficient.
But that does not mean that FSEditLog should be overloaded; it just means that 
one method logSync() should be generalized to allow efficient implementation of 
BK streams *as well as* other (file and backup) streams.

So the proposal. Lets put an if statement in logSync() for now, which checks 
whether all streams are bookKeeper streams and then if it is it does not go 
into synchronized sections in logSync() (avoids waiting) or alternatively calls 
a BK specific method.
I say this because if the name-node uses BK logging together with other logging 
types then logging will go with the speed of the slowest journal. So the only 
case when BK can benefit from optimized logSync() is when there are now other 
than BK types of streams.
Hope that makes sense.

> Modifications to enable multiple types of logging 
> --------------------------------------------------
>
>                 Key: HADOOP-5188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5188
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: HADOOP-5188.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5188) Modifications to enable multiple types of logging

Reply via email to