[ 
https://issues.apache.org/jira/browse/HDFS-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4479:
----------------------------------

    Description: 
In FSNamesystem#commitBlockSynchronization of branch-1, logSync() may be called 
when the FSNamesystem lock is held. Similar to HDFS-4186, this may cause some 
performance issue.

The following issue was observed in a cluster that was running a Hive job and 
was writing to 100,000 temporary files (each task is writing to 1000s of 
files). When this job is killed, a large number of files are left open for 
write. Eventually when the lease for open files expires, lease recovery is 
started for all these files in a very short duration of time. This causes a 
large number of commitBlockSynchronization where logSync is performed with the 
FSNamesystem lock held. This overloads the namenode resulting in slowdown.

Since logSync is called right after the synchronization section, we can simply 
remove the logSync call.

  was:
In FSNamesystem#commitBlockSynchronization of branch-1, logSync() may be called 
when the FSNamesystem lock is held. Similar with HDFS-4186, this may cause some 
performance issue.

Since logSync is called right after the synchronization section, we can simply 
remove the logSync call.

    
> logSync() with the FSNamesystem lock held in commitBlockSynchronization
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4479
>                 URL: https://issues.apache.org/jira/browse/HDFS-4479
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>             Fix For: 1.2.0
>
>         Attachments: HDFS-4479.b1.001.patch, HDFS-4479.b1.002.patch
>
>
> In FSNamesystem#commitBlockSynchronization of branch-1, logSync() may be 
> called when the FSNamesystem lock is held. Similar to HDFS-4186, this may 
> cause some performance issue.
> The following issue was observed in a cluster that was running a Hive job and 
> was writing to 100,000 temporary files (each task is writing to 1000s of 
> files). When this job is killed, a large number of files are left open for 
> write. Eventually when the lease for open files expires, lease recovery is 
> started for all these files in a very short duration of time. This causes a 
> large number of commitBlockSynchronization where logSync is performed with 
> the FSNamesystem lock held. This overloads the namenode resulting in slowdown.
> Since logSync is called right after the synchronization section, we can 
> simply remove the logSync call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to