[ 
https://issues.apache.org/jira/browse/HADOOP-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288362#comment-14288362
 ] 

Dmitriy V. Ryaboy commented on HADOOP-11506:
--------------------------------------------

Most properties are not subject to variable substitution, and exit in the 
following code block:

{code}
 if (!match.find()) {
return eval;
}
{code}

Getting there requires creating a matcher, allocating a HashSet, and evaluating 
the regex:
{code}
private static final Pattern VAR_PATTERN =
Pattern.compile("\\$\\{[^\\}\\$\u0020]+\\}");
{code}

'tis far simpler to bail early and not do expensive regex evaluation in the 
majority of cases, by adding a simple check:

{code}
 if (expr == null) {
return null;
}
if (!expr.contains("$")) {
  return expr;
}
{code}

(The new check is the second if condition above).

Many users assume that Configuration.get() is a Map lookup, and call it inside 
map / reduce functions, which adds up to non-trivial overhead when the m/r 
functions are simple.

> Configuration.get() is unnecessarily slow
> -----------------------------------------
>
>                 Key: HADOOP-11506
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11506
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>
> Profiling several large Hadoop jobs, we discovered that a surprising amount 
> of time was spent inside Configuration.get, more specifically, in regex 
> matching caused by the substituteVars call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to