[
https://issues.apache.org/jira/browse/HDFS-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Franklinsam Paul updated HDFS-16955:
------------------------------------
Description:
Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we
need timeout to be set for group lookup call in other " group mapping service
providers" such as
*org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and*
*org.apache.hadoop.security.ShellBasedUnixGroupsMapping.*
Currently the group lookup delay hold locks for long time and crashes the
Namenode. This is to timeout the call and send the user the failure of
operation due to group lookup is delayed.
{code:java}
2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential
performance problem: getGroups(user=XXXXXXXXXX) took 232236 milliseconds.
2023-03-01 18:49:25,368 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of suppressed
read-lock reports: 21
Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms via
java.lang.Thread.getStackTrace(Thread.java:1564) {code}
Along with longest lock , we could also consider printing a message, only if
all handlers are waiting due to current lock which might cause a failover/crash
due to ha timeout
was:
Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we
need timeout to be set for group lookup call in other " group mapping service
providers" such as
*org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and*
*org.apache.hadoop.security.ShellBasedUnixGroupsMapping.*
Currently the group lookup delay hold locks for long time and crashes the
Namenode. This is to timeout the call and send the user the failure of
operation due to group lookup is delayed.
{code:java}
2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential
performance problem: getGroups(user=XXXXXXXXXX) took 232236 milliseconds.
2023-03-01 18:49:25,368 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of suppressed
read-lock reports: 21
Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms via
java.lang.Thread.getStackTrace(Thread.java:1564) {code}
> Need to timeout grouplookup calls if delayed long
> -------------------------------------------------
>
> Key: HDFS-16955
> URL: https://issues.apache.org/jira/browse/HDFS-16955
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Reporter: Franklinsam Paul
> Priority: Major
>
> Similar to "hadoop.security.group.mapping.ldap.directory.search.timeout" we
> need timeout to be set for group lookup call in other " group mapping service
> providers" such as
> *org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and*
> *org.apache.hadoop.security.ShellBasedUnixGroupsMapping.*
>
> Currently the group lookup delay hold locks for long time and crashes the
> Namenode. This is to timeout the call and send the user the failure of
> operation due to group lookup is delayed.
>
> {code:java}
> 2023-03-01 18:49:25,367 WARN org.apache.hadoop.security.Groups: Potential
> performance problem: getGroups(user=XXXXXXXXXX) took 232236 milliseconds.
> 2023-03-01 18:49:25,368 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> suppressed read-lock reports: 21
> Longest read-lock held at 1970-01-11 13:29:34,218+0100 for 232236ms
> via java.lang.Thread.getStackTrace(Thread.java:1564) {code}
>
>
> Along with longest lock , we could also consider printing a message, only if
> all handlers are waiting due to current lock which might cause a
> failover/crash due to ha timeout
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]