[ 
https://issues.apache.org/jira/browse/PHOENIX-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145102#comment-14145102
 ] 

James Taylor commented on PHOENIX-1273:
---------------------------------------

Patch looks good - thanks, [~jmspaggi]. There's a few complications that I 
hadn't considered before that need to be addressed:
- Some of the optimizations we do for the right-hand-side being a 
LiteralExpression in ExpressionCompiler can't be done if 
likeType==LikeType.CASE_INSENSITIVE. See TODO below on this:
{code}
    @Override
    public Expression visitLeave(LikeParseNode node, List<Expression> children) 
throws SQLException {
        ParseNode lhsNode = node.getChildren().get(0);
        ParseNode rhsNode = node.getChildren().get(1);
        Expression lhs = children.get(0);
        Expression rhs = children.get(1);
        if ( rhs.getDataType() != null && lhs.getDataType() != null && 
                !lhs.getDataType().isCoercibleTo(rhs.getDataType())  && 
                !rhs.getDataType().isCoercibleTo(lhs.getDataType())) {
            throw TypeMismatchException.newException(lhs.getDataType(), 
rhs.getDataType(), node.toString());
        }
        if (lhsNode instanceof BindParseNode) {
            context.getBindManager().addParamMetaData((BindParseNode)lhsNode, 
rhs);
        }
        if (rhsNode instanceof BindParseNode) {
            context.getBindManager().addParamMetaData((BindParseNode)rhsNode, 
lhs);
        }
        if (rhs instanceof LiteralExpression) {
            String pattern = (String)((LiteralExpression)rhs).getValue();
            if (pattern == null || pattern.length() == 0) {
                return LiteralExpression.newConstant(null, 
rhs.isDeterministic());
            }
            // TODO: for pattern of '%' optimize to strlength(lhs) > 0
            // We can't use lhs IS NOT NULL b/c if lhs is NULL we need
            // to return NULL.
            int index = LikeExpression.indexOfWildcard(pattern);
            // Can't possibly be as long as the constant, then FALSE
            Integer lhsMaxLength = lhs.getMaxLength();
            if (lhsMaxLength != null && lhsMaxLength < index) {
                return LiteralExpression.newConstant(false, 
rhs.isDeterministic());
            }
            if (index == -1) {
                String rhsLiteral = LikeExpression.unescapeLike(pattern);
                if (lhsMaxLength != null && lhsMaxLength != 
rhsLiteral.length()) {
                    return LiteralExpression.newConstant(false, 
rhs.isDeterministic());
                }
                // TODO: if (likeType == LikeType.CASE_SENSITIVE) {
                CompareOp op = node.isNegate() ? CompareOp.NOT_EQUAL : 
CompareOp.EQUAL;
                if (pattern.equals(rhsLiteral)) {
                    return new ComparisonExpression(op, children);
                } else {
                    rhs = LiteralExpression.newConstant(rhsLiteral, 
PDataType.CHAR, rhs.isDeterministic());
                    return new ComparisonExpression(op, Arrays.asList(lhs,rhs));
                }
            }
        }
{code}
- There's an edge case where the pattern isn't a constant. For example, you 
could have the pattern as the value of a column and do this: WHERE foo ILIKE 
bar. So you need to serialize the LikeType and react to an EOFException since 
the an older client wouldn't be sending you the LikeType (i.e. we upgrade the 
cluster first, followed by the clients). I'd also recommend adding a protected 
compilePattern method to LikeExpression to consolidate the logic:
{code}
    private void init() {
        Expression e = getPatternExpression();
        if (e instanceof LiteralExpression) {
            LiteralExpression patternExpression = (LiteralExpression)e;
            String value = (String)patternExpression.getValue();
            pattern = compilePattern(value);
        }
    }

    protected Pattern compilePattern (String value) {
        return Pattern.compile(toPattern(value)); // TODO: consider LikeType
    }

    @Override
    public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
        Pattern pattern = this.pattern;
        if (pattern == null) { // TODO: don't allow? this is going to be 
slooowwww
            if (!getPatternExpression().evaluate(tuple, ptr)) {
                if (logger.isDebugEnabled()) {
                    logger.debug("LIKE is FALSE: pattern is null");
                }
                return false;
            }
            String value = (String)PDataType.VARCHAR.toObject(ptr, 
getPatternExpression().getSortOrder());
            pattern = compilePattern(value);
            if (logger.isDebugEnabled()) {
                logger.debug("LIKE pattern is expression: " + 
pattern.pattern());
            }
        }
        
        if (!getStrExpression().evaluate(tuple, ptr)) {
            if (logger.isDebugEnabled()) {
                logger.debug("LIKE is FALSE: child expression is null");
            }
            return false;
        }
        if (ptr.getLength() == 0) {
            return true;
        }
        
        String value = (String)PDataType.VARCHAR.toObject(ptr, 
getStrExpression().getSortOrder());
        boolean matched = pattern.matcher(value).matches();
        ptr.set(matched ? PDataType.TRUE_BYTES : PDataType.FALSE_BYTES);
        if (logger.isDebugEnabled()) {
            logger.debug("LIKE(value='" + value + "'pattern='" + 
pattern.pattern() + "' is " + matched);
        }
        return true;
    }

    @Override
    public void readFields(DataInput input) throws IOException {
        super.readFields(input);
        init();
        if (pattern == null) {
            try {
                likeType = LikeType.values()[WritableUtils.readVInt(input)];
            } catch (EOFException e) {
                likeType = LikeType.CASE_SENSITIVE;
            }
        }
    }

    @Override
    public void write(DataOutput output) throws IOException {
        super.write(output);
        if (pattern == null) {
            WritableUtils.writeVInt(output, likeType.ordinal());
        }
    }

{code}
- There's an optimization we do for LikeExpression in WhereOptimizer. This 
looks at the pattern and if it starts with non wildcard characters, we can form 
a key range for the column you're matching against. For example, with WHERE pk 
LIKE 'foo%', we know that the range will be from "foo" to "fop" exclusive which 
ends up allowing us to set the scan start/stop or form a skip scan filter, 
depending on the rest of the WHERE clause. We could do something similar for 
ILIKE, but we'd need to generate two ranges (which is fine, as our skip scan 
will handle that without an issue). Your call - it can either be done now, or 
in a follow on JIRA, but here's the place that needs to change:
{code}
        @Override
        public Iterator<Expression> visitEnter(LikeExpression node) {
            // TODO: can we optimize something that starts with '_' like this: 
foo LIKE '_a%' ?
            if (node.getLikeType() == LIkeType.CASE_INSENSITIVE || // TODO: 
remove this when we optimize ILIKE
               ! (node.getChildren().get(1) instanceof LiteralExpression) || 
node.startsWithWildcard()) {
                return Iterators.emptyIterator();
            }

            return Iterators.singletonIterator(node.getChildren().get(0));
        }

        @Override
        public KeySlots visitLeave(LikeExpression node, List<KeySlots> 
childParts) {
            // TODO: optimize ILIKE by creating two ranges for the literal 
prefix: one with lower case, one with upper case
{code}
- Minor nits: declare likeType as private in LikeExpression and uppercase the 
LikeType enums.


> Phoenix should provide ILIKE keyword
> ------------------------------------
>
>                 Key: PHOENIX-1273
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1273
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.2
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Jean-Marc Spaggiari
>            Priority: Minor
>         Attachments: PHOENIX-1273-v1.patch, PHOENIX-1273-v2.patch, 
> PHOENIX-1273.patch
>
>
> Phoenix provides LIKE keyword but not ILIKE which is a case insensitive LIKE. 
> Goal of this patch is to add this keyword.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to