[
https://issues.apache.org/jira/browse/HIVE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047281#comment-17047281
]
Krisztian Kasa edited comment on HIVE-22929 at 2/28/20 7:47 AM:
----------------------------------------------------------------
[~gopalv]
String.replace implementation is:
{code}
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
{code}
So it also calls Pattern.compile with *target* every time it called.
The difference between replace and replaceAll is:
{code}
replace
Pattern.compile(target.toString(), Pattern.LITERAL)
{code}
{code}
replaceAll
Pattern.compile(regex)
{code}
I did some testing:
{code}
@Test
public void testReplacePerf() {
long count = 10000000;
long start = System.currentTimeMillis();
for (int i = 0; i < count; ++i) {
String s = "sample sample".replaceAll("am", "b");
}
System.out.println("String.replaceAll: " + (System.currentTimeMillis() -
start));
start = System.currentTimeMillis();
for (int i = 0; i < count; ++i) {
String s = "sample sample".replace("am", "b");
}
System.out.println("String.replace: " + (System.currentTimeMillis() -
start));
start = System.currentTimeMillis();
final Pattern REGEX = Pattern.compile("am", Pattern.LITERAL);
for (int i = 0; i < count; ++i) {
String s = RegExUtils.replaceAll("sample sample", REGEX, "b");
}
System.out.println("Precompiled regex + RegExUtils.replaceAll:" +
(System.currentTimeMillis() - start));
}
{code}
{code}
String.replaceAll: 4037
String.replace: 3072
Precompiled regex + RegExUtils.replaceAll:2216
{code}
Please share your thoughts.
was (Author: kkasa):
[~gopalv]
String.replace implementation is:
{code}
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
{code}
So it also calls Pattern.compile with *target* every time it called.
The difference between replace and replaceAll is:
{code}
replace
Pattern.compile(target.toString(), Pattern.LITERAL)
{code}
{code}
replaceAll
Pattern.compile(regex)
{code}
I did some testing:
{code}
public static final Pattern REGEX = Pattern.compile("am", Pattern.LITERAL);
@Test
public void testReplacePerf() {
long count = 10000000;
long start = System.currentTimeMillis();
for (int i = 0; i < count; ++i) {
String s = "sample sample".replaceAll("am", "b");
}
System.out.println("String.replaceAll: " + (System.currentTimeMillis() -
start));
start = System.currentTimeMillis();
for (int i = 0; i < count; ++i) {
String s = "sample sample".replace("am", "b");
}
System.out.println("String.replace: " + (System.currentTimeMillis() -
start));
start = System.currentTimeMillis();
for (int i = 0; i < count; ++i) {
String s = RegExUtils.replaceAll("sample sample", REGEX, "b");
}
System.out.println("Precompiled regex + RegExUtils.replaceAll:" +
(System.currentTimeMillis() - start));
}
{code}
{code}
String.replaceAll: 3997
String.replace: 3028
Precompiled regex + RegExUtils.replaceAll:2164
{code}
Please share your thoughts.
> Performance: quoted identifier parsing uses throwaway Regex via
> String.replaceAll()
> -----------------------------------------------------------------------------------
>
> Key: HIVE-22929
> URL: https://issues.apache.org/jira/browse/HIVE-22929
> Project: Hive
> Issue Type: Bug
> Reporter: Gopal Vijayaraghavan
> Assignee: Krisztian Kasa
> Priority: Major
> Attachments: HIVE-22929.1.patch, String.replaceAll.png
>
>
> !String.replaceAll.png!
> https://github.com/apache/hive/blob/master/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g#L530
> {code}
> '`' ( '``' | ~('`') )* '`' { setText(getText().substring(1,
> getText().length() -1 ).replaceAll("``", "`")); }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)