[
https://issues.apache.org/jira/browse/HADOOP-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288362#comment-14288362
]
Dmitriy V. Ryaboy commented on HADOOP-11506:
--------------------------------------------
Most properties are not subject to variable substitution, and exit in the
following code block:
{code}
if (!match.find()) {
return eval;
}
{code}
Getting there requires creating a matcher, allocating a HashSet, and evaluating
the regex:
{code}
private static final Pattern VAR_PATTERN =
Pattern.compile("\\$\\{[^\\}\\$\u0020]+\\}");
{code}
'tis far simpler to bail early and not do expensive regex evaluation in the
majority of cases, by adding a simple check:
{code}
if (expr == null) {
return null;
}
if (!expr.contains("$")) {
return expr;
}
{code}
(The new check is the second if condition above).
Many users assume that Configuration.get() is a Map lookup, and call it inside
map / reduce functions, which adds up to non-trivial overhead when the m/r
functions are simple.
> Configuration.get() is unnecessarily slow
> -----------------------------------------
>
> Key: HADOOP-11506
> URL: https://issues.apache.org/jira/browse/HADOOP-11506
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Dmitriy V. Ryaboy
>
> Profiling several large Hadoop jobs, we discovered that a surprising amount
> of time was spent inside Configuration.get, more specifically, in regex
> matching caused by the substituteVars call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)