[
https://issues.apache.org/jira/browse/PHOENIX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145102#comment-14145102
]
James Taylor commented on PHOENIX-1273:
---------------------------------------
Patch looks good - thanks, [~jmspaggi]. There's a few complications that I
hadn't considered before that need to be addressed:
- Some of the optimizations we do for the right-hand-side being a
LiteralExpression in ExpressionCompiler can't be done if
likeType==LikeType.CASE_INSENSITIVE. See TODO below on this:
{code}
@Override
public Expression visitLeave(LikeParseNode node, List<Expression> children)
throws SQLException {
ParseNode lhsNode = node.getChildren().get(0);
ParseNode rhsNode = node.getChildren().get(1);
Expression lhs = children.get(0);
Expression rhs = children.get(1);
if ( rhs.getDataType() != null && lhs.getDataType() != null &&
!lhs.getDataType().isCoercibleTo(rhs.getDataType()) &&
!rhs.getDataType().isCoercibleTo(lhs.getDataType())) {
throw TypeMismatchException.newException(lhs.getDataType(),
rhs.getDataType(), node.toString());
}
if (lhsNode instanceof BindParseNode) {
context.getBindManager().addParamMetaData((BindParseNode)lhsNode,
rhs);
}
if (rhsNode instanceof BindParseNode) {
context.getBindManager().addParamMetaData((BindParseNode)rhsNode,
lhs);
}
if (rhs instanceof LiteralExpression) {
String pattern = (String)((LiteralExpression)rhs).getValue();
if (pattern == null || pattern.length() == 0) {
return LiteralExpression.newConstant(null,
rhs.isDeterministic());
}
// TODO: for pattern of '%' optimize to strlength(lhs) > 0
// We can't use lhs IS NOT NULL b/c if lhs is NULL we need
// to return NULL.
int index = LikeExpression.indexOfWildcard(pattern);
// Can't possibly be as long as the constant, then FALSE
Integer lhsMaxLength = lhs.getMaxLength();
if (lhsMaxLength != null && lhsMaxLength < index) {
return LiteralExpression.newConstant(false,
rhs.isDeterministic());
}
if (index == -1) {
String rhsLiteral = LikeExpression.unescapeLike(pattern);
if (lhsMaxLength != null && lhsMaxLength !=
rhsLiteral.length()) {
return LiteralExpression.newConstant(false,
rhs.isDeterministic());
}
// TODO: if (likeType == LikeType.CASE_SENSITIVE) {
CompareOp op = node.isNegate() ? CompareOp.NOT_EQUAL :
CompareOp.EQUAL;
if (pattern.equals(rhsLiteral)) {
return new ComparisonExpression(op, children);
} else {
rhs = LiteralExpression.newConstant(rhsLiteral,
PDataType.CHAR, rhs.isDeterministic());
return new ComparisonExpression(op, Arrays.asList(lhs,rhs));
}
}
}
{code}
- There's an edge case where the pattern isn't a constant. For example, you
could have the pattern as the value of a column and do this: WHERE foo ILIKE
bar. So you need to serialize the LikeType and react to an EOFException since
the an older client wouldn't be sending you the LikeType (i.e. we upgrade the
cluster first, followed by the clients). I'd also recommend adding a protected
compilePattern method to LikeExpression to consolidate the logic:
{code}
private void init() {
Expression e = getPatternExpression();
if (e instanceof LiteralExpression) {
LiteralExpression patternExpression = (LiteralExpression)e;
String value = (String)patternExpression.getValue();
pattern = compilePattern(value);
}
}
protected Pattern compilePattern (String value) {
return Pattern.compile(toPattern(value)); // TODO: consider LikeType
}
@Override
public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
Pattern pattern = this.pattern;
if (pattern == null) { // TODO: don't allow? this is going to be
slooowwww
if (!getPatternExpression().evaluate(tuple, ptr)) {
if (logger.isDebugEnabled()) {
logger.debug("LIKE is FALSE: pattern is null");
}
return false;
}
String value = (String)PDataType.VARCHAR.toObject(ptr,
getPatternExpression().getSortOrder());
pattern = compilePattern(value);
if (logger.isDebugEnabled()) {
logger.debug("LIKE pattern is expression: " +
pattern.pattern());
}
}
if (!getStrExpression().evaluate(tuple, ptr)) {
if (logger.isDebugEnabled()) {
logger.debug("LIKE is FALSE: child expression is null");
}
return false;
}
if (ptr.getLength() == 0) {
return true;
}
String value = (String)PDataType.VARCHAR.toObject(ptr,
getStrExpression().getSortOrder());
boolean matched = pattern.matcher(value).matches();
ptr.set(matched ? PDataType.TRUE_BYTES : PDataType.FALSE_BYTES);
if (logger.isDebugEnabled()) {
logger.debug("LIKE(value='" + value + "'pattern='" +
pattern.pattern() + "' is " + matched);
}
return true;
}
@Override
public void readFields(DataInput input) throws IOException {
super.readFields(input);
init();
if (pattern == null) {
try {
likeType = LikeType.values()[WritableUtils.readVInt(input)];
} catch (EOFException e) {
likeType = LikeType.CASE_SENSITIVE;
}
}
}
@Override
public void write(DataOutput output) throws IOException {
super.write(output);
if (pattern == null) {
WritableUtils.writeVInt(output, likeType.ordinal());
}
}
{code}
- There's an optimization we do for LikeExpression in WhereOptimizer. This
looks at the pattern and if it starts with non wildcard characters, we can form
a key range for the column you're matching against. For example, with WHERE pk
LIKE 'foo%', we know that the range will be from "foo" to "fop" exclusive which
ends up allowing us to set the scan start/stop or form a skip scan filter,
depending on the rest of the WHERE clause. We could do something similar for
ILIKE, but we'd need to generate two ranges (which is fine, as our skip scan
will handle that without an issue). Your call - it can either be done now, or
in a follow on JIRA, but here's the place that needs to change:
{code}
@Override
public Iterator<Expression> visitEnter(LikeExpression node) {
// TODO: can we optimize something that starts with '_' like this:
foo LIKE '_a%' ?
if (node.getLikeType() == LIkeType.CASE_INSENSITIVE || // TODO:
remove this when we optimize ILIKE
! (node.getChildren().get(1) instanceof LiteralExpression) ||
node.startsWithWildcard()) {
return Iterators.emptyIterator();
}
return Iterators.singletonIterator(node.getChildren().get(0));
}
@Override
public KeySlots visitLeave(LikeExpression node, List<KeySlots>
childParts) {
// TODO: optimize ILIKE by creating two ranges for the literal
prefix: one with lower case, one with upper case
{code}
- Minor nits: declare likeType as private in LikeExpression and uppercase the
LikeType enums.
> Phoenix should provide ILIKE keyword
> ------------------------------------
>
> Key: PHOENIX-1273
> URL: https://issues.apache.org/jira/browse/PHOENIX-1273
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.2
> Reporter: Jean-Marc Spaggiari
> Assignee: Jean-Marc Spaggiari
> Priority: Minor
> Attachments: PHOENIX-1273-v1.patch, PHOENIX-1273-v2.patch,
> PHOENIX-1273.patch
>
>
> Phoenix provides LIKE keyword but not ILIKE which is a case insensitive LIKE.
> Goal of this patch is to add this keyword.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)