[ 
https://issues.apache.org/jira/browse/HIVE-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682299#comment-13682299
 ] 

Teddy Choi commented on HIVE-4642:
----------------------------------

[~ehans], I read some books and papers about regular expressions. According to 
http://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times
 , DFA construction time and backtracking NFA running time are exponential, so 
they are not good solutions.

According to http://swtch.com/~rsc/regexp/regexp1.html and 
http://www.cs.ucdavis.edu/~green/papers/techrept02.pdf , lazy DFA could be a 
good choice. It's construction time is O(n ), and its running time is O(m^2*n) 
at most. If there are enough target strings and they share a similar pattern, 
then the average running time will become O(n ). So it looks promising.

It's hard to find a Java lazy DFA regular expression engine. java.util.regex is 
traditional NFA, and JRegex is DFA. Jarkta Regex is retired, so is Jarkarta 
ORO. And others are not updated for years. I'm surprised that it's hard to find 
one. So I think it will be good to implement one by myself. Fortunately, it 
doesn't seem hard to implement.

If you know an alternative solution, please let me know it.
                
> Implement vectorized RLIKE and REGEXP filter expressions
> --------------------------------------------------------
>
>                 Key: HIVE-4642
>                 URL: https://issues.apache.org/jira/browse/HIVE-4642
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
>            Assignee: Teddy Choi
>
> See title. I will add more details next week. The goal is (a) make this work 
> correctly and (b) optimize it as well as possible, at least for the common 
> cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to