[
https://issues.apache.org/jira/browse/HIVE-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175218#comment-15175218
]
Gopal V commented on HIVE-13196:
--------------------------------
Wrote a JMH bench, which explains this change -
https://github.com/t3rmin4t0r/regexbench
{code}
# Run complete. Total time: 00:00:41
Benchmark Mode Cnt Score Error Units
RegexBench.testGreedyRegexHit avgt 5 340.991 ± 7.929 ns/op
RegexBench.testGreedyRegexMiss avgt 5 466.184 ± 21.349 ns/op
RegexBench.testLazyRegexHit avgt 5 72.456 ± 16.156 ns/op
RegexBench.testLazyRegexMiss avgt 5 366.955 ± 49.159 ns/op
{code}
> UDFLike: reduce Regex NFA sizes
> -------------------------------
>
> Key: HIVE-13196
> URL: https://issues.apache.org/jira/browse/HIVE-13196
> Project: Hive
> Issue Type: Improvement
> Components: UDF
> Affects Versions: 1.3.0, 1.2.1, 2.0.0, 2.1.0
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Minor
> Attachments: HIVE-13196.1.patch
>
>
> The NFAs built from complex regexes in UDFLike are extremely complex and
> spend a lot of time doing simple expression matching with no backtracking.
> Prevent NFA -> DFA explosion by using reluctant regex matches instead of
> greedy matches.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)