[
https://issues.apache.org/jira/browse/HADOOP-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297807#comment-14297807
]
Andrew Wang commented on HADOOP-11506:
--------------------------------------
Happy to, thanks for working on this everyone. Review comments below, only one
substantial one. Core logic looks good though.
* The new recursion detection only handles a two-var loop, but the old one
handles bigger loops with the hash set. For instance, {{FOO1=FOO2, FOO2=FOO3,
FOO3=FOO1}}. The behavior is a little different thus; before, we'd return the
start of the loop, here, we'd hit the IllegalStateException at the end. I feel
like the exception is arguably better, but compatibility dictates we preserve
the existing behavior.
Nits:
* Not caused by your patch, but do you mind adding javadoc to substituteVars
about the expected input/output? Same for findSubVariable would be nice.
* Also good to document why we go through this dance rather than using a regex
(can mention this JIRA #)
* Did you use a return parameter for performance reasons? Eden allocation is
pretty fast, so just returning seems fine. With all the {{substring}} and
string concat flying around, we're doing a lot of allocation anyway.
Spacing is a bit off here:
{code}
eval = eval.substring(0, dollar)
{code}
remove "the"
{code}
// that can occur in the nested class names
{code}
I think this should be a right brace rather than left for ultimate clarity:
{code}
int afterRightBrace = varBounds[SUB_END_IDX] + "{".length();
{code}
Similar, I think {{"\{c"}} should be {{"c\}"}} for ultimate clarity:
{code}
bracePos > 0 && bracePos + "{c".length() < eval.length();
{code}
> Configuration.get() is unnecessarily slow
> -----------------------------------------
>
> Key: HADOOP-11506
> URL: https://issues.apache.org/jira/browse/HADOOP-11506
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Dmitriy V. Ryaboy
> Assignee: Gera Shegalov
> Attachments: HADOOP-11506.001.patch, HADOOP-11506.002.patch
>
>
> Profiling several large Hadoop jobs, we discovered that a surprising amount
> of time was spent inside Configuration.get, more specifically, in regex
> matching caused by the substituteVars call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)