[jira] [Commented] (HDFS-2982) Startup performance suffers when there are many edit log segments

Colin Patrick McCabe (JIRA) Thu, 17 May 2012 23:34:34 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278610#comment-13278610
 ]


Colin Patrick McCabe commented on HDFS-2982:
--------------------------------------------

There are lots and lots of unit tests would have to change if 
EditLogInputStream started requiring an init() call.  Not to mention the subtle 
bugs that might crop up.  That alone would almost be worth its own patch.  
Let's deal with this later if we decide it's something worth doing.  Frankly, I 
would argue against it because I think there's better APIs we could design.  In 
particular, an API which separates the concept of a stream from the concept of 
a stream location is much more efficient and results in cleaner code, because 
the invariant that you can't use something without initializing it is then 
enforced by the type system.  So basically, can we revisit this idea later, as 
in after this week?

bq. The new test case is missing the @Test annotation so it won't run.

Will fix.

bq. Are the changes to validateEditLog necessary here? And the change to how 
corrupt files are handled?

It's often really time consuming to change these things because then I have to 
redo all the unit tests.  Still, I will take a look at it.
                
> Startup performance suffers when there are many edit log segments
> -----------------------------------------------------------------
>
>                 Key: HDFS-2982
>                 URL: https://issues.apache.org/jira/browse/HDFS-2982
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0
>            Reporter: Todd Lipcon
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-2982.001.patch
>
>
> For every one of the edit log segments, it seems like we are calling 
> listFiles on the edit log directory inside of {{findMaxTransaction}}. This is 
> killing performance, especially when there are many log segments and the 
> directory is stored on NFS. It is taking several minutes to start up the NN 
> when there are several thousand log segments present.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2982) Startup performance suffers when there are many edit log segments

Reply via email to