[ 
https://issues.apache.org/jira/browse/HIVE-20819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roohi Syeda updated HIVE-20819:
-------------------------------
    Description: 
Leaking Metastore connections when HADOOP_USER_NAME environmental variable is 
set.

The connections created are in ESTABLISHED state and never closed

 

*More Details :* 

When a new query is executed for a new session

 

The handler thread, calls line 66 HiveSessionImplwithUGI

(UserGroupInformation.createProxyUser(owner, 
UserGroupInformation.getLoginUser());

 

At *query compile time*, this sessionUgi is used to open MS connection by 
*handler* thread

Later at *query run time*, line 277 of SQLOperation

Runnable work =     new BackgroundWork(getCurrentUGI(), 
parentSession.getSessionHive(), SessionState.get(),asyncPrepare);

 getCurrentUGI(); is used to create a new proxy user, which in turn calls 
Utils.getUGI (see below) and passed to the *Background* thread

 
{code:java}
public static UserGroupInformation getUGI() throws LoginException, IOException {

   String doAs = System.getenv("HADOOP_USER_NAME");

   if(doAs != null && doAs.length() > 0) {

    /*

     * this allows doAs (proxy user) to be passed along across process boundary 
where

     * delegation tokens are not supported.  For example, a DDL stmt via 
WebHCat with

     * a doAs parameter, forks to 'hcat' which needs to start a Session that

     * proxies the end user

     */

     return UserGroupInformation.createProxyUser(doAs, 
UserGroupInformation.getLoginUser());

   }

   return UserGroupInformation.getCurrentUser();

 }
{code}
 

currentUGI creates a *new* proxyuser instance. This ugi is being set on the 
background thread

And when it is trying to get the Hive db in subsequent calls, we see that since 
the ugi’s are not equal (See the equals code below), a new connection is 
opened, which is never closed, by background thread.

Line 318 in Hive.java

 
{code:java}
 private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,

     boolean doRegisterAllFns) throws HiveException {

   Hive db = hiveDB.get();

   if (db == null || !db.isCurrentUserOwner() || needsRefresh

       || (c != null && !isCompatible(db, c, isFastCheck)))

{      db = create(c, false, db, doRegisterAllFns);    }

   if (c != null)

{      db.conf = c;    }

   return db;

 }

 

private boolean isCurrentUserOwner() throws HiveException {

   try

{      return owner == null || 
owner.equals(UserGroupInformation.getCurrentUser());    }

catch(IOException e)

{      throw new HiveException("Error getting current user: " + e.getMessage(), 
e);    }

 }

/**

  * Compare the subjects to see if they are equal to each other.

  */

 @Override

 public boolean equals(Object o) {

   if (o == this)

{      return true;    }

else if (o == null || getClass() != o.getClass())

{      return false;    }

else

{      return subject == ((UserGroupInformation) o).subject;    }

 }

 
{code}
 

Solution:

When we assign *currentUGI* to the bg thread, we should call 
UserGroupInformation.getCurrentUser() (see below) instead of calling *getUGI* 
method listed above (which creates a new instance of proxy user and subject and 
will always return isCurrentUserOwner as false, since both subject and ugi 
instances are different and hence creates a new MS connection)

 
{code:java}
/**

  * Return the current user, including any doAs in the current stack.

  */

 public synchronized

 static UserGroupInformation getCurrentUser() throws IOException {

   AccessControlContext context = AccessController.getContext();

   Subject subject = Subject.getSubject(context);

   if (subject == null || subject.getPrincipals(User.class).isEmpty())

{      return getLoginUser();    }

else

{      return new UserGroupInformation(subject);    }

 }
{code}
 

 

 

  was:
Leaking Metastore connections when HADOOP_USER_NAME environmental variable is 
set.

The connections created are in ESTABLISHED state and never closed

 

*More Details :* 

When a new query is executed for a new session

 

The handler thread, calls line 66 HiveSessionImplwithUGI

(UserGroupInformation.createProxyUser(owner, 
UserGroupInformation.getLoginUser());

 

At *query compile time*, this sessionUgi is used to open MS connection by 
*handler* thread

Later at *query run time*, line 277 of SQLOperation

Runnable work =     new BackgroundWork(getCurrentUGI(), 
parentSession.getSessionHive(), SessionState.get(),asyncPrepare);

 getCurrentUGI(); is used to create a new proxy user, which in turn calls 
Utils.getUGI (see below) and passed to the *Background* thread

 public static UserGroupInformation *getUGI*() throws LoginException, 
IOException {

   String doAs = System.getenv("HADOOP_USER_NAME");

   if(doAs != null && doAs.length() > 0)

{  

   /*      * this allows doAs (proxy user) to be passed along across process 
boundary where      * delegation tokens are not supported.  For example, a DDL 
stmt via WebHCat with      * a doAs parameter, forks to 'hcat' which needs to 
start a Session that      * proxies the end user      */     

         return UserGroupInformation.createProxyUser(doAs, 
UserGroupInformation.getLoginUser());  

  }

   return UserGroupInformation.getCurrentUser();

 }

 

currentUGI creates a *new* proxyuser instance. This ugi is being set on the 
background thread

And when it is trying to get the Hive db in subsequent calls, we see that since 
the ugi’s are not equal (See the equals code below), a new connection is 
opened, which is never closed, by background thread.

Line 318 in Hive.java

 

 private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,

     boolean doRegisterAllFns) throws HiveException {

   Hive db = hiveDB.get();

   if (db == null || !db.*isCurrentUserOwner*() || needsRefresh

       || (c != null && !isCompatible(db, c, isFastCheck)))

{      db = create(c, false, db, doRegisterAllFns);    }

   if (c != null)

{      db.conf = c;    }

   return db;

 }

 

private boolean isCurrentUserOwner() throws HiveException {

   try

{      return owner == null || 
owner.equals(UserGroupInformation.getCurrentUser());    }

catch(IOException e)

{      throw new HiveException("Error getting current user: " + e.getMessage(), 
e);    }

 }

/**

  * Compare the subjects to see if they are equal to each other.

  */

 @Override

 public boolean *equals*(Object o) {

   if (o == this)

{      return true;    }

else if (o == null || getClass() != o.getClass())

{      return false;    }

else

{      return subject == ((UserGroupInformation) o).subject;    }

 }

 

Solution:

When we assign *currentUGI* to the bg thread, we should call 
UserGroupInformation.getCurrentUser() (see below) instead of calling *getUGI* 
method listed above (which creates a new instance of proxy user and subject and 
will always return isCurrentUserOwner as false, since both subject and ugi 
instances are different and hence creates a new MS connection)

 

/**

  * Return the current user, including any doAs in the current stack.

  */

 

 public synchronized

 static UserGroupInformation getCurrentUser() throws IOException {

   AccessControlContext context = AccessController.getContext();

   Subject subject = Subject.getSubject(context);

   if (subject == null || subject.getPrincipals(User.class).isEmpty())

{      return getLoginUser();    }

else

{      return new UserGroupInformation(subject);    }

 }

 

 


> Leaking Metastore connections when HADOOP_USER_NAME environmental variable is 
> set
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-20819
>                 URL: https://issues.apache.org/jira/browse/HIVE-20819
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Roohi Syeda
>            Assignee: Roohi Syeda
>            Priority: Minor
>         Attachments: HIVE-20819.1.patch
>
>
> Leaking Metastore connections when HADOOP_USER_NAME environmental variable is 
> set.
> The connections created are in ESTABLISHED state and never closed
>  
> *More Details :* 
> When a new query is executed for a new session
>  
> The handler thread, calls line 66 HiveSessionImplwithUGI
> (UserGroupInformation.createProxyUser(owner, 
> UserGroupInformation.getLoginUser());
>  
> At *query compile time*, this sessionUgi is used to open MS connection by 
> *handler* thread
> Later at *query run time*, line 277 of SQLOperation
> Runnable work =     new BackgroundWork(getCurrentUGI(), 
> parentSession.getSessionHive(), SessionState.get(),asyncPrepare);
>  getCurrentUGI(); is used to create a new proxy user, which in turn calls 
> Utils.getUGI (see below) and passed to the *Background* thread
>  
> {code:java}
> public static UserGroupInformation getUGI() throws LoginException, 
> IOException {
>    String doAs = System.getenv("HADOOP_USER_NAME");
>    if(doAs != null && doAs.length() > 0) {
>     /*
>      * this allows doAs (proxy user) to be passed along across process 
> boundary where
>      * delegation tokens are not supported.  For example, a DDL stmt via 
> WebHCat with
>      * a doAs parameter, forks to 'hcat' which needs to start a Session that
>      * proxies the end user
>      */
>      return UserGroupInformation.createProxyUser(doAs, 
> UserGroupInformation.getLoginUser());
>    }
>    return UserGroupInformation.getCurrentUser();
>  }
> {code}
>  
> currentUGI creates a *new* proxyuser instance. This ugi is being set on the 
> background thread
> And when it is trying to get the Hive db in subsequent calls, we see that 
> since the ugi’s are not equal (See the equals code below), a new connection 
> is opened, which is never closed, by background thread.
> Line 318 in Hive.java
>  
> {code:java}
>  private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
> isFastCheck,
>      boolean doRegisterAllFns) throws HiveException {
>    Hive db = hiveDB.get();
>    if (db == null || !db.isCurrentUserOwner() || needsRefresh
>        || (c != null && !isCompatible(db, c, isFastCheck)))
> {      db = create(c, false, db, doRegisterAllFns);    }
>    if (c != null)
> {      db.conf = c;    }
>    return db;
>  }
>  
> private boolean isCurrentUserOwner() throws HiveException {
>    try
> {      return owner == null || 
> owner.equals(UserGroupInformation.getCurrentUser());    }
> catch(IOException e)
> {      throw new HiveException("Error getting current user: " + 
> e.getMessage(), e);    }
>  }
> /**
>   * Compare the subjects to see if they are equal to each other.
>   */
>  @Override
>  public boolean equals(Object o) {
>    if (o == this)
> {      return true;    }
> else if (o == null || getClass() != o.getClass())
> {      return false;    }
> else
> {      return subject == ((UserGroupInformation) o).subject;    }
>  }
>  
> {code}
>  
> Solution:
> When we assign *currentUGI* to the bg thread, we should call 
> UserGroupInformation.getCurrentUser() (see below) instead of calling *getUGI* 
> method listed above (which creates a new instance of proxy user and subject 
> and will always return isCurrentUserOwner as false, since both subject and 
> ugi instances are different and hence creates a new MS connection)
>  
> {code:java}
> /**
>   * Return the current user, including any doAs in the current stack.
>   */
>  public synchronized
>  static UserGroupInformation getCurrentUser() throws IOException {
>    AccessControlContext context = AccessController.getContext();
>    Subject subject = Subject.getSubject(context);
>    if (subject == null || subject.getPrincipals(User.class).isEmpty())
> {      return getLoginUser();    }
> else
> {      return new UserGroupInformation(subject);    }
>  }
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to