[
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271789#comment-14271789
]
Daryn Sharp edited comment on HDFS-7597 at 1/9/15 8:02 PM:
-----------------------------------------------------------
The problem stems from the delegation token identifier generating a distinct
UGI every time getUser() is called. The simple solution is caching the token
identifier to ugi mapping. The webhdfs servlet extracts the token
identifier's UGI so it can wrap the operation in a doAs. Caching will result
in the same token identifier (from multiple connections) using a single UGI
context on the client-side which then utilizes the possibly cached RPC
connection.
Caching will also benefit the server-side RPC layer. Multiple token-based
connections will share a common UGI, thus reducing the garbage generated esp.
for short-lived connections.
This change should ideally be common, but I'm uncomfortable with whether other
non-NN hadoop services (ex. yarn) are doing server-side manipulation of the ugi.
was (Author: daryn):
The problem stems from the delegation token identifier generating a distinct
UGI every time getUser() is called. The simple solution is caching the token
identifier to ugi mapping. The webhdfs servlet extracts the token
identifier's UGI so it can wrap the operation in a doAs. Caching will result
in the same token identifier (from multiple connections) using a single UGI
context on the client-side which then utilizes the possibly cached RPC
connection.
Caching will also benefit the server-side RPC layer. Multiple token-based
connections will share a common UGI, thus reducing the garbage generated esp.
for short-lived connections.
This change should ideally be common, but I'm uncomfortable with whether other
non-NN hadoop services (ex. yarn) are doing server-side manipulation of the
ugi. I'll post a patch early next week after a POC clears QA.
> Clients seeking over webhdfs may crash the NN
> ---------------------------------------------
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: webhdfs
> Affects Versions: 2.0.0-alpha
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Priority: Critical
>
> Webhdfs seeks involve closing the current connection, and reissuing a new
> open request with the new offset. The RPC layer caches connections so the DN
> keeps a lingering connection open to the NN. Connection caching is in part
> based on UGI. Although the client used the same token for the new offset
> request, the UGI is different which forces the DN to open another unnecessary
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)