[
https://issues.apache.org/jira/browse/PHOENIX-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360263#comment-14360263
]
Shuxiong Ye edited comment on PHOENIX-1287 at 3/13/15 3:59 PM:
---------------------------------------------------------------
Hi [~jamestaylor],
Most Code of ByteBasedLikeExpression will be same as LikeExpression, and I
think this is not good for further management and development. Considering such
as a case, if we want to update the logic of LikeExpression, we have to update
ByteBaseLikeExpression, too.
My plan will be:
1. Pass USE_BYTE_BASE_REGEX options to the reg expression(LikeExpression,
RegexpReplaceFunction, RegexpSplitFunction, RegexpSubstrFunction)
2. In these expressions, they try to use the proper Pattern Matcher according
the options.
3. There will be a Pattern Matcher Factory, which produces j.u.regex-based one
and byte-based one.
4. Interface of The Base Pattern Matcher looks like:
{code:java}
Pattern compile(ImmutableBytesWritable ptr);
Matcher matcher(ImmutableBytesWritable ptr);
void replace(ImmutableBytesWritable ptr, ImmutableBytesWritable outputPtr)
{code}
How about this?
Thanks.
-----------------------
I checked the code, and find it is hard to pass context from Parser to
Expression.
Another way is
1. add USE_BYTE_BASED_REGEX options to Expression(e.g. RegexpReplaceFunction),
but they do not use byte-based by default.
2. add a wrapper ByteBasedRegexExpression(e.g. ByteBasedRegexpReplaceFunction)
inheriting from Expression(e.g. RegexpReplaceFunction), but turn on
USE_BYTE_BASED_REGEX options.
How about this?
Thanks.
was (Author: shuxi0ng):
Hi [~jamestaylor],
Most Code of ByteBasedLikeExpression will be same as LikeExpression, and I
think this is not good for further management and development. Considering such
as a case, if we want to update the logic of LikeExpression, we have to update
ByteBaseLikeExpression, too.
My plan will be:
1. Pass USE_BYTE_BASE_REGEX options to the reg expression(LikeExpression,
RegexpReplaceFunction, RegexpSplitFunction, RegexpSubstrFunction)
2. In these expressions, they try to use the proper Pattern Matcher according
the options.
3. There will be a Pattern Matcher Factory, which produces j.u.regex-based one
and byte-based one.
4. Interface of The Base Pattern Matcher looks like:
{code:java}
Pattern compile(ImmutableBytesWritable ptr);
Matcher matcher(ImmutableBytesWritable ptr);
void replace(ImmutableBytesWritable ptr, ImmutableBytesWritable outputPtr)
{code}
How about this?
Thanks.
> Use the joni byte[] regex engine in place of j.u.regex
> ------------------------------------------------------
>
> Key: PHOENIX-1287
> URL: https://issues.apache.org/jira/browse/PHOENIX-1287
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Labels: gsoc2015
>
> See HBASE-11907. We'd get a 2x perf benefit plus it's driven off of byte[]
> instead of strings.Thanks for the pointer, [~apurtell].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)