[jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

Todd Lipcon (JIRA) Fri, 03 Jun 2011 11:13:02 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043928#comment-13043928
 ]


Todd Lipcon commented on HDFS-2003:
-----------------------------------

bq. This is intentional. We read from the journal which has the most edits 
available to read. If this happens to be a journal with a truncated file, that 
journal is still the journal with the most up to date logs. Do you disagree?

In my opinion, that's the responsibility of the "edit log recovery" process to 
determine, and then truncate the file at the correct length. But, I see your 
point as well, and don't feel strongly about it. Either way, though, it's a 
distinct change from just the refactor - can we keep the current behavior in 
the refactor, and then make that behavioral change separately?

Some other thoughts:
- Do we need Reader to be an inner class of FSEditLogOp? I find it a little 
strange to have all of the Reader code, and then the "final Codes opCode" and 
"final long txid;" right after it.

I think the patch would produce fewer conflicts on merge if we made the 
following change:
- Keep FSEditLogOpCodes as is (so we don't have changes throughout 
EditLogFileInputStream/OutputStream/FSEditLog/Loader/OEV/etc. (this will help 
prevent merge conflicts against HDFS-1936 in particular)

----

One idea, which you can take or leave: what if we did added a {{Class<? extends 
FSEditLogOp>}} field to the Codes enum, and then did the following:
in Reader constructor:
{code}
EnumMap<Codes, FSEditLogOp> opInstances;
for (Codes c : Codes.values()) {
  opInstances.put(c, c.getOpClass().newInstance());
}
{code}
in readOp instead of the switch statement:
{code}
FSEditLogOp op = opInstances.get(opCode);
op.readFields(in, logVersion);
{code}
This idea would remove the object overhead of creating new objects for each 
case, make opcodes more like writables, and get rid of the big switch 
statement. Might also be a good first step towards sharing more code between 
the OEV and the normal edits loader. This is just a thought, though - if you 
don't like it, ignore me :)



> Separate FSEditLog reading logic from editLog memory state building logic
> -------------------------------------------------------------------------
>
>                 Key: HDFS-2003
>                 URL: https://issues.apache.org/jira/browse/HDFS-2003
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: Edit log branch (HDFS-1073)
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff
>
>
> Currently FSEditLogLoader has code for reading from an InputStream 
> interleaved with code which updates the FSNameSystem and FSDirectory. This 
> makes it difficult to read an edit log without having a whole load of other 
> object initialised, which is problematic if you want to do things like count 
> how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory 
> state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

Reply via email to