[
https://issues.apache.org/jira/browse/HIVE-14573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843690#comment-15843690
]
Gopal V commented on HIVE-14573:
--------------------------------
Wrote a small expected benchmark at
https://github.com/t3rmin4t0r/boyermoore/blob/master/src/main/java/org/notmysock/benchmark/BoyerMooreHorspool.java
> Vectorization: Implement StringExpr::find()
> --------------------------------------------
>
> Key: HIVE-14573
> URL: https://issues.apache.org/jira/browse/HIVE-14573
> Project: Hive
> Issue Type: Bug
> Reporter: Gopal V
> Assignee: Teddy Choi
>
> Currently, the LIKE expression implementation is a dumb StringExpr::equals()
> loop.
> For an input of N bytes and a pattern of M bytes, this has the complexity of
> ((N-M)*M), which is not an issue with small patterns or small inputs.
> The pattern matching is currently optimized for matches, while in clickstream
> data the opposite is true in general.
> From the common crawl data, the following run will go through the same
> {code}
> select count(1) from uservisits_orc_data where useragent like "%Opera%" and
> searchword LIKE "%fruit%";
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)