[ 
https://issues.apache.org/jira/browse/PHOENIX-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360263#comment-14360263
 ] 

Shuxiong Ye edited comment on PHOENIX-1287 at 3/13/15 3:59 PM:
---------------------------------------------------------------

Hi [~jamestaylor], 

Most Code of ByteBasedLikeExpression will be same as LikeExpression, and I 
think this is not good for further management and development. Considering such 
as a case, if we want to update the logic of LikeExpression, we have to update 
ByteBaseLikeExpression, too.

My plan will be:

1. Pass USE_BYTE_BASE_REGEX options to the reg expression(LikeExpression, 
RegexpReplaceFunction, RegexpSplitFunction, RegexpSubstrFunction)
2. In these expressions, they try to use the proper Pattern Matcher according 
the options.

3. There will be a Pattern Matcher Factory, which produces j.u.regex-based one 
and byte-based one.
4. Interface of The Base Pattern Matcher looks like:
{code:java}
Pattern compile(ImmutableBytesWritable ptr);
Matcher matcher(ImmutableBytesWritable ptr);
void replace(ImmutableBytesWritable ptr, ImmutableBytesWritable outputPtr)
{code}

How about this?

Thanks.

-----------------------

I checked the code, and find it is hard to pass context from Parser to 
Expression.

Another way is 
1. add USE_BYTE_BASED_REGEX options to Expression(e.g. RegexpReplaceFunction), 
but they do not use byte-based by default.
2. add a wrapper ByteBasedRegexExpression(e.g. ByteBasedRegexpReplaceFunction) 
inheriting from Expression(e.g. RegexpReplaceFunction), but turn on 
USE_BYTE_BASED_REGEX options.

How about this?

Thanks.


was (Author: shuxi0ng):
Hi [~jamestaylor], 

Most Code of ByteBasedLikeExpression will be same as LikeExpression, and I 
think this is not good for further management and development. Considering such 
as a case, if we want to update the logic of LikeExpression, we have to update 
ByteBaseLikeExpression, too.

My plan will be:

1. Pass USE_BYTE_BASE_REGEX options to the reg expression(LikeExpression, 
RegexpReplaceFunction, RegexpSplitFunction, RegexpSubstrFunction)
2. In these expressions, they try to use the proper Pattern Matcher according 
the options.

3. There will be a Pattern Matcher Factory, which produces j.u.regex-based one 
and byte-based one.
4. Interface of The Base Pattern Matcher looks like:
{code:java}
Pattern compile(ImmutableBytesWritable ptr);
Matcher matcher(ImmutableBytesWritable ptr);
void replace(ImmutableBytesWritable ptr, ImmutableBytesWritable outputPtr)
{code}

How about this?

Thanks.


> Use the joni byte[] regex engine in place of j.u.regex
> ------------------------------------------------------
>
>                 Key: PHOENIX-1287
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1287
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>              Labels: gsoc2015
>
> See HBASE-11907. We'd get a 2x perf benefit plus it's driven off of byte[] 
> instead of strings.Thanks for the pointer, [~apurtell].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to