[ 
https://issues.apache.org/jira/browse/HDFS-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franklinsam Paul updated HDFS-16955:
------------------------------------
    Description: 
Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we 
need timeout to be set for group lookup call in other " group mapping service 
providers" such as 
*org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and* 
*org.apache.hadoop.security.ShellBasedUnixGroupsMapping.* 

 

Currently the group lookup delay hold locks for long time and crashes the 
Namenode. This  is to timeout the call and send the user the failure of 
operation due to group lookup is delayed.

 
{code:java}
2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential 
performance problem: getGroups(user=XXXXXXXXXX) took 232236 milliseconds.
2023-03-01 18:49:25,368 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:       Number of suppressed 
read-lock reports: 21
        Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms via 
java.lang.Thread.getStackTrace(Thread.java:1564) {code}
 

 

Along with longest lock , we could also consider printing a message, only if 
all handlers are waiting due to current lock which might cause a failover/crash 
due to ha timeout

  was:
Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we 
need timeout to be set for group lookup call in other " group mapping service 
providers" such as 
*org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and* 
*org.apache.hadoop.security.ShellBasedUnixGroupsMapping.* 

 

Currently the group lookup delay hold locks for long time and crashes the 
Namenode. This  is to timeout the call and send the user the failure of 
operation due to group lookup is delayed.

 
{code:java}
2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential 
performance problem: getGroups(user=XXXXXXXXXX) took 232236 milliseconds.
2023-03-01 18:49:25,368 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:       Number of suppressed 
read-lock reports: 21
        Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms via 
java.lang.Thread.getStackTrace(Thread.java:1564) {code}
 


> Need to timeout grouplookup calls if delayed long
> -------------------------------------------------
>
>                 Key: HDFS-16955
>                 URL: https://issues.apache.org/jira/browse/HDFS-16955
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Franklinsam Paul
>            Priority: Major
>
> Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we 
> need timeout to be set for group lookup call in other " group mapping service 
> providers" such as 
> *org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and* 
> *org.apache.hadoop.security.ShellBasedUnixGroupsMapping.* 
>  
> Currently the group lookup delay hold locks for long time and crashes the 
> Namenode. This  is to timeout the call and send the user the failure of 
> operation due to group lookup is delayed.
>  
> {code:java}
> 2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential 
> performance problem: getGroups(user=XXXXXXXXXX) took 232236 milliseconds.
> 2023-03-01 18:49:25,368 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:       Number of 
> suppressed read-lock reports: 21
>         Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms 
> via java.lang.Thread.getStackTrace(Thread.java:1564) {code}
>  
>  
> Along with longest lock , we could also consider printing a message, only if 
> all handlers are waiting due to current lock which might cause a 
> failover/crash due to ha timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to