[jira] [Comment Edited] (PHOENIX-10) Push projection of a single ARRAY element to the server

James Taylor (JIRA) Fri, 07 Feb 2014 12:43:47 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895002#comment-13895002
 ]


James Taylor edited comment on PHOENIX-10 at 2/7/14 8:41 PM:
-------------------------------------------------------------

Thanks, [~ram_krish]. You're doing a fantastic job coming up to speed on this. 
Appreciate all the contributions.

Would you mind spinning up a pull request on our github mirror 
(https://github.com/apache/incubator-phoenix) for this change? We're trying to 
follow the same methodology as the apache jcloud guys documented here: 
http://wiki.apache.org/jclouds/Committers%20Guide

Essentially, you just need to clone the github mirror, apply your patch to your 
cloned version, and then send a pull request back to our github mirror. This 
will generate an email to our dev list. Then I can make more detailed comments 
on your patch.

For now, here's some quick feedback:
- good point for GROUP BY queries. We don't want to do this for aggregate 
queries at all, because they execute on the server side already so there's no 
need. In your visitLeave(FunctionParseNode) call, add a SelectStatement member 
variable to ProjectionCompiler and add this check:
            if (!select.isAggregate() && 
ArrayIndexFunction.NAME.equals(node.getName())) {
- you shouldn't need that instanceof check for if(func instanceof 
ArrayIndexFunction) a few lines below this
- on the server-side changes in ScanRegionObserver, create the KeyValueSchema 
once in the deserializeArrayPostionalExpressionInfoFromScan and add a member 
variable for it. Also, add a member variable for the ValueBitSet created to go 
with it and an ImmutableBytesWritable ptr so you don't need to create a new one 
each time.
- change you replaceArrayIndexElement only remove the KVs in the loop (and use 
remove(idx) as you have the list element position - I don't think there's a 
need to allocate a new list of KVs).
- at the end of the loop, evaluate the arrayFuncRefs like this (make 
arrayFuncRefs an array instead of a list in 
deserializeArrayPostionalExpressionInfoFromScan):
{code}
    Tuple tuple = new MultiKeyValueTuple(result);
    // Evaluate the array index func references against the current tuple
    byte[] value = keyValueSchema.toBytes(tuple, arrayFuncRefs, valueBitSet, 
ptr);
    result.add(new KeyValue(QueryConstants.ARRAY_DUMMY_ROW,
                                ScanProjector.VALUE_COLUMN_FAMILY, 
ScanProjector.VALUE_COLUMN_QUALIFIER,
                                value));
{code}




was (Author: jamestaylor):
Thanks, [~ram_krish]. You're doing a fantastic job coming up to speed on this. 
Appreciate all the contributions.

Would you mind spinning up a pull request on our github mirror 
(https://github.com/apache/incubator-phoenix) for this change? We're trying to 
follow the same methodology as the apache jcloud guys documented here: 
http://wiki.apache.org/jclouds/Committers%20Guide

Essentially, you just need to clone the github mirror, apply your patch to your 
cloned version, and then send a pull request back to our github mirror. This 
will generate an email to our dev list. Then I can make more detailed comments 
on your patch.

For now, here's some quick feedback:
- good point for GROUP BY queries. We don't want to do this for aggregate 
queries at all, because they execute on the server side already so there's no 
need. In your visitLeave(FunctionParseNode) call, add a SelectStatement member 
variable to ProjectionCompiler and add this check:
            if (!select.isAggregate() && 
ArrayIndexFunction.NAME.equals(node.getName())) {
- you shouldn't need that instanceof check for if(func instanceof 
ArrayIndexFunction) a few lines below this
- on the server-side changes in ScanRegionObserver, create the KeyValueSchema 
once in the deserializeArrayPostionalExpressionInfoFromScan and add a member 
variable for it. Also, add a member variable for the ValueBitSet created to go 
with it and an ImmutableBytesWritable ptr so you don't need to create a new one 
each time.
- change you replaceArrayIndexElement only remove the KVs in the loop (and use 
remove(idx) as you have the list element position - I don't think there's a 
need to allocate a new list of KVs).
- at the end of the loop, evaluate the arrayFuncRefs like this:
{code}
    Tuple tuple = new MultiKeyValueTuple(result);
    // Evaluate the array index func references against the current tuple
    byte[] value = keyValueSchema.toBytes(tuple, arrayFuncRefs, valueBitSet, 
ptr);
    result.add(new KeyValue(QueryConstants.ARRAY_DUMMY_ROW,
                                ScanProjector.VALUE_COLUMN_FAMILY, 
ScanProjector.VALUE_COLUMN_QUALIFIER,
                                value));
{code}



> Push projection of a single ARRAY element to the server
> -------------------------------------------------------
>
>                 Key: PHOENIX-10
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-10
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-10_1.patch, Phoenix-10_2.patch
>
>
> If only a single array element is selected, we'll still return the entire 
> array back to the client. Instead, we should push this to the server and only 
> return the single array element. The same goes for the reference to an ARRAY 
> in the WHERE clause. There's a general HBase fix for this (i.e. the ability 
> to define a separate set of key values that will be returned versus key 
> values available to filters) that has a patch here, but is deemed not 
> possible to pull into the 0.94 branch by @lhofhansl.
> My thought is that we can add a Filter at the end our our filter chain that 
> filters out any KeyValues that aren't in the SELECT expressions (i.e. filter 
> out if a column is referenced in the WHERE clause, but not in the SELECT 
> expressions). This same Filter could handle returning only the elements of 
> the array that are referenced in the SELECT expression rather than the entire 
> array.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (PHOENIX-10) Push projection of a single ARRAY element to the server

Reply via email to