[
https://issues.apache.org/jira/browse/HADOOP-12436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959085#comment-14959085
]
Allen Wittenauer commented on HADOOP-12436:
-------------------------------------------
So a few things:
a) we are clearly missing test coverage in common, since this issue wasn't
detected there. Those tests should probably be either moved or at least
replicated over in common for better, more complete testing.
b) we're hitting a (documented!) incompatibility between
com.google.re2j.PatternSyntaxException and
java.util.regex.PatternSyntaxException
c) GlobPattern is Private, Evolving . GlobFilter is Public, Evolving but it
converts the PatternSyntaxException to IOException, so even though this is an
incompatibility, no deprecation should be required. That said, we should
definitely scan the source for any other calls into GlobPattern to see if they
are processing PatternSyntaxException.
> GlobPattern regex library has performance issues with wildcard characters
> -------------------------------------------------------------------------
>
> Key: HADOOP-12436
> URL: https://issues.apache.org/jira/browse/HADOOP-12436
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 2.2.0, 2.7.1
> Reporter: Matthew Paduano
> Assignee: Matthew Paduano
> Fix For: 3.0.0
>
> Attachments: HADOOP-12436.01.patch, HADOOP-12436.02.patch,
> HADOOP-12436.03.patch, HADOOP-12436.04.patch
>
>
> java.util.regex classes have performance problems with certain wildcard
> patterns. Namely, consecutive * characters in a file name (not properly
> escaped as literals) will cause commands such as "hadoop fs -ls
> file******name" to consume 100% CPU and probably never return in a reasonable
> time (time scales with number of *'s).
> Here is an example:
> {noformat}
> hadoop fs -touchz
> /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D\\\+\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\*\\\+\\\+\\\+...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist
> hadoop fs -ls
> /user/mattp/job_1429571161900_4222-1430338332599-tda%2D%2D+******************************+++...%270%27%28Stage-1430338580443-39-2000-SUCCEEDED-production%2Dhigh-1430338340360.jhist
> {noformat}
> causes:
> {noformat}
> PID COMMAND %CPU TIME
> 14526 java 100.0 01:18.85
> {noformat}
> Not every string of *'s causes this, but the above filename reproduces this
> reliably.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)