[
https://issues.apache.org/jira/browse/HIVE-20819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roohi Syeda updated HIVE-20819:
-------------------------------
Description:
Leaking Metastore connections when HADOOP_USER_NAME environmental variable is
set.
The connections created are in ESTABLISHED state and never closed
*More Details :*
When a new query is executed for a new session
The handler thread, calls line 66 HiveSessionImplwithUGI
(UserGroupInformation.createProxyUser(owner,
UserGroupInformation.getLoginUser());
At *query compile time*, this sessionUgi is used to open MS connection by
*handler* thread
Later at *query run time*, line 277 of SQLOperation
Runnable work = new BackgroundWork(getCurrentUGI(),
parentSession.getSessionHive(), SessionState.get(),asyncPrepare);
getCurrentUGI(); is used to create a new proxy user, which in turn calls
Utils.getUGI (see below) and passed to the *Background* thread
public static UserGroupInformation *getUGI*() throws LoginException,
IOException {
String doAs = System.getenv("HADOOP_USER_NAME");
if(doAs != null && doAs.length() > 0)
{
/* * this allows doAs (proxy user) to be passed along across process
boundary where * delegation tokens are not supported. For example, a DDL
stmt via WebHCat with * a doAs parameter, forks to 'hcat' which needs to
start a Session that * proxies the end user */
return UserGroupInformation.createProxyUser(doAs,
UserGroupInformation.getLoginUser());
}
return UserGroupInformation.getCurrentUser();
}
currentUGI creates a *new* proxyuser instance. This ugi is being set on the
background thread
And when it is trying to get the Hive db in subsequent calls, we see that since
the ugi’s are not equal (See the equals code below), a new connection is
opened, which is never closed, by background thread.
Line 318 in Hive.java
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
Hive db = hiveDB.get();
if (db == null || !db.*isCurrentUserOwner*() || needsRefresh
|| (c != null && !isCompatible(db, c, isFastCheck)))
{ db = create(c, false, db, doRegisterAllFns); }
if (c != null)
{ db.conf = c; }
return db;
}
private boolean isCurrentUserOwner() throws HiveException {
try
{ return owner == null ||
owner.equals(UserGroupInformation.getCurrentUser()); }
catch(IOException e)
{ throw new HiveException("Error getting current user: " + e.getMessage(),
e); }
}
/**
* Compare the subjects to see if they are equal to each other.
*/
@Override
public boolean *equals*(Object o) {
if (o == this)
{ return true; }
else if (o == null || getClass() != o.getClass())
{ return false; }
else
{ return subject == ((UserGroupInformation) o).subject; }
}
Solution:
When we assign *currentUGI* to the bg thread, we should call
UserGroupInformation.getCurrentUser() (see below) instead of calling *getUGI*
method listed above (which creates a new instance of proxy user and subject and
will always return isCurrentUserOwner as false, since both subject and ugi
instances are different and hence creates a new MS connection)
/**
* Return the current user, including any doAs in the current stack.
*/
public synchronized
static UserGroupInformation getCurrentUser() throws IOException {
AccessControlContext context = AccessController.getContext();
Subject subject = Subject.getSubject(context);
if (subject == null || subject.getPrincipals(User.class).isEmpty())
{ return getLoginUser(); }
else
{ return new UserGroupInformation(subject); }
}
was:
Leaking Metastore connections when HADOOP_USER_NAME environmental variable is
set.
The connections created are in ESTABLISHED state and never closed
*More Details :*
When a new query is executed for a new session
The handler thread, calls line 66 HiveSessionImplwithUGI
(UserGroupInformation.createProxyUser(
owner, UserGroupInformation.getLoginUser());
At *query compile time*, this sessionUgi is used to open MS connection by
*handler* thread
Later at *query run time*, line 277 of SQLOperation
Runnable work =
new BackgroundWork(getCurrentUGI(), parentSession.getSessionHive(),
SessionState.get(),
asyncPrepare);
getCurrentUGI(); is used to create a new proxy user, which in turn calls
Utils.getUGI (see below) and passed to the *Background* thread
public static UserGroupInformation *getUGI*() throws LoginException,
IOException {
String doAs = System.getenv("HADOOP_USER_NAME");
if(doAs != null && doAs.length() > 0) {
/*
* this allows doAs (proxy user) to be passed along across process
boundary where
* delegation tokens are not supported. For example, a DDL stmt via
WebHCat with
* a doAs parameter, forks to 'hcat' which needs to start a Session that
* proxies the end user
*/
return UserGroupInformation.createProxyUser(doAs,
UserGroupInformation.getLoginUser());
}
return UserGroupInformation.getCurrentUser();
}
currentUGI creates a *new* proxyuser instance. This ugi is being set on the
background thread
And when it is trying to get the Hive db in subsequent calls, we see that since
the ugi’s are not equal (See the equals code below), a new connection is
opened, which is never closed, by background thread.
Line 318 in Hive.java
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
Hive db = hiveDB.get();
if (db == null || !db.*isCurrentUserOwner*() || needsRefresh
|| (c != null && !isCompatible(db, c, isFastCheck))) {
db = create(c, false, db, doRegisterAllFns);
}
if (c != null) {
db.conf = c;
}
return db;
}
private boolean isCurrentUserOwner() throws HiveException {
try {
return owner == null ||
owner.equals(UserGroupInformation.getCurrentUser());
} catch(IOException e) {
throw new HiveException("Error getting current user: " + e.getMessage(),
e);
}
}
/**
* Compare the subjects to see if they are equal to each other.
*/
@Override
public boolean *equals*(Object o) {
if (o == this) {
return true;
} else if (o == null || getClass() != o.getClass()) {
return false;
} else {
return subject == ((UserGroupInformation) o).subject;
}
}
Solution:
When we assign *currentUGI* to the bg thread, we should call
UserGroupInformation.getCurrentUser() (see below) instead of calling *getUGI*
method listed above (which creates a new instance of proxy user and subject and
will always return isCurrentUserOwner as false, since both subject and ugi
instances are different and hence creates a new MS connection)
/**
* Return the current user, including any doAs in the current stack.
*/
public synchronized
static UserGroupInformation getCurrentUser() throws IOException {
AccessControlContext context = AccessController.getContext();
Subject subject = Subject.getSubject(context);
if (subject == null || subject.getPrincipals(User.class).isEmpty()) {
return getLoginUser();
} else {
return new UserGroupInformation(subject);
}
}
> Leaking Metastore connections when HADOOP_USER_NAME environmental variable is
> set
> ---------------------------------------------------------------------------------
>
> Key: HIVE-20819
> URL: https://issues.apache.org/jira/browse/HIVE-20819
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Reporter: Roohi Syeda
> Assignee: Roohi Syeda
> Priority: Minor
> Attachments: HIVE-20819.1.patch
>
>
> Leaking Metastore connections when HADOOP_USER_NAME environmental variable is
> set.
> The connections created are in ESTABLISHED state and never closed
>
> *More Details :*
> When a new query is executed for a new session
>
> The handler thread, calls line 66 HiveSessionImplwithUGI
> (UserGroupInformation.createProxyUser(owner,
> UserGroupInformation.getLoginUser());
>
> At *query compile time*, this sessionUgi is used to open MS connection by
> *handler* thread
> Later at *query run time*, line 277 of SQLOperation
> Runnable work = new BackgroundWork(getCurrentUGI(),
> parentSession.getSessionHive(), SessionState.get(),asyncPrepare);
> getCurrentUGI(); is used to create a new proxy user, which in turn calls
> Utils.getUGI (see below) and passed to the *Background* thread
> public static UserGroupInformation *getUGI*() throws LoginException,
> IOException {
> String doAs = System.getenv("HADOOP_USER_NAME");
> if(doAs != null && doAs.length() > 0)
> {
> /* * this allows doAs (proxy user) to be passed along across process
> boundary where * delegation tokens are not supported. For example, a
> DDL stmt via WebHCat with * a doAs parameter, forks to 'hcat' which
> needs to start a Session that * proxies the end user */
> return UserGroupInformation.createProxyUser(doAs,
> UserGroupInformation.getLoginUser());
> }
> return UserGroupInformation.getCurrentUser();
> }
>
> currentUGI creates a *new* proxyuser instance. This ugi is being set on the
> background thread
> And when it is trying to get the Hive db in subsequent calls, we see that
> since the ugi’s are not equal (See the equals code below), a new connection
> is opened, which is never closed, by background thread.
> Line 318 in Hive.java
>
> private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean
> isFastCheck,
> boolean doRegisterAllFns) throws HiveException {
> Hive db = hiveDB.get();
> if (db == null || !db.*isCurrentUserOwner*() || needsRefresh
> || (c != null && !isCompatible(db, c, isFastCheck)))
> { db = create(c, false, db, doRegisterAllFns); }
> if (c != null)
> { db.conf = c; }
> return db;
> }
>
> private boolean isCurrentUserOwner() throws HiveException {
> try
> { return owner == null ||
> owner.equals(UserGroupInformation.getCurrentUser()); }
> catch(IOException e)
> { throw new HiveException("Error getting current user: " +
> e.getMessage(), e); }
> }
> /**
> * Compare the subjects to see if they are equal to each other.
> */
> @Override
> public boolean *equals*(Object o) {
> if (o == this)
> { return true; }
> else if (o == null || getClass() != o.getClass())
> { return false; }
> else
> { return subject == ((UserGroupInformation) o).subject; }
> }
>
> Solution:
> When we assign *currentUGI* to the bg thread, we should call
> UserGroupInformation.getCurrentUser() (see below) instead of calling *getUGI*
> method listed above (which creates a new instance of proxy user and subject
> and will always return isCurrentUserOwner as false, since both subject and
> ugi instances are different and hence creates a new MS connection)
>
> /**
> * Return the current user, including any doAs in the current stack.
> */
>
> public synchronized
> static UserGroupInformation getCurrentUser() throws IOException {
> AccessControlContext context = AccessController.getContext();
> Subject subject = Subject.getSubject(context);
> if (subject == null || subject.getPrincipals(User.class).isEmpty())
> { return getLoginUser(); }
> else
> { return new UserGroupInformation(subject); }
> }
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)