Daryn Sharp commented on HDFS-12278:

For context regarding the impact of the change to a priority queue:  Hours 
after a 2.8 upgrade, avg rpc processing time increased from sub-ms to 21ms.  
Rpc queue time was multiple seconds.  Killing large jobs only made it worse.  
The fair call queue was completely overflowing for ~5h.  I haven't seen 
anything this horrific in many years.

While the NN log was spewing logs of skipping calls from timing out clients, we 
noticed lease monitor recovery log messages ~5-12ms apart during which time the 
lease monitor holds the write lock.  Killing jobs made it worse because it 
created more orphaned leases.

> LeaseManager#removeLease operation is inefficient in 2.8.
> ---------------------------------------------------------
>                 Key: HDFS-12278
>                 URL: https://issues.apache.org/jira/browse/HDFS-12278
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>            Priority: Blocker
> After HDFS-6757, LeaseManager #removeLease became expensive. 
> HDFS-6757 changed the {{sortedLeases}} object from TreeSet to PriorityQueue. 
> Previously the {{remove(Object)}} operation from {{sortedLeases}} was {{O(log 
> n)}} but after the change it became {{O( n)}} since it has to find the object 
> first. 
> Recently we had an incident in one of our production cluster just hours after 
> we upgraded from 2.7 to 2.8 
> The {{sortledLeases}} object had approximately 100,000 items within it. 
> While removing the lease, it will acquire the LeaseManager lock and that will 
> slow down the lookup of lease also.  
> HDFS-6757 is a good improvement which replaced the path by inode id.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to