[ 
https://issues.apache.org/jira/browse/PHOENIX-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032152#comment-16032152
 ] 

James Taylor commented on PHOENIX-3773:
---------------------------------------

Thanks for the updated patch, [~singamteja]. It's looking very good. Here's 
some feedback:

Instead of deserializing the byte arrays to objects (a relatively expensive 
operation) in the evaluate method of FirstLastValueBaseClientAggregator, store 
the bytes pointer in a List<ImmutableBytesWritable>.
{code}
+                    if (getMultipleValues) {
+                        
multipleValuesResult.add(PDataType.fromTypeId(dataType.getSqlType() - 
PDataType.ARRAY_TYPE_BASE).toObject(it.next()));
+                        if (++counter == offset) {
+                            break;
+                        }
+                    } else {
{code}
Then, once you have all the values, combine them together using the 
PArrayDataType.appendItemToArray() method outside the loop.
{code}
-            //not enought values to return Nth
+            if (getMultipleValues && multipleValuesResult.size() > 0) {
+                
ptr.set(dataType.toBytes(PArrayDataType.instantiatePhoenixArray(PDataType.fromTypeId(dataType.getSqlType()
 - PDataType.ARRAY_TYPE_BASE), multipleValuesResult.toArray())));
+                return true;
+            }
+
+            //not enough values to return Nth or top N'values or trying to 
retreive (N+1)th value
{code}
I think the handling of the case in which there are no values may not be 
handled correctly. You need to move this check to the top of the aggregate 
method:
{code}
        if (topValue == null) {
            return false;
        }
{code}
And this would be the only case that would return false. Otherwise, if there 
aren't enough values, do this:
{code}
    ptr.set(ByteUtil.EMPTY_BYTE_ARRAY);
    return true;
{code}
Probably a good idea to have a test that asks for the top 3 values when there 
are only 2 values to make sure that case works too (if you don't have that 
already).

Also, one minor nit. How about instead of calling the new member variable 
getMultipleValues you call it isArrayReturnType?

> Implement FIRST_VALUES aggregate function
> -----------------------------------------
>
>                 Key: PHOENIX-3773
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3773
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: Loknath Priyatham Teja Singamsetty 
>              Labels: SFDC
>             Fix For: 4.11.0
>
>         Attachments: PHOENIX-3773_4.x-HBase-0.98.patch, 
> PHOENIX-3773_master.patch, PHOENIX-3773.patch, PHOENIX-3773.v2.patch, 
> PHOENIX-3773.v3.patch
>
>
> Similar to FIRST_VALUE, but would allow the user to specify how many values 
> to keep. This could use a MinMaxPriorityQueue under the covers and be much 
> more efficient than using multiple NTH_VALUE calls to do the same like this:
> {code}
> SELECT entity_id,
>        NTH_VALUE(user_id,1) WITHIN GROUP (ORDER BY last_read_date DESC) as 
> nth1_user_id,
>        NTH_VALUE(user_id,2) WITHIN GROUP (ORDER BY last_read_date DESC) as 
> nth2_user_id,
>        NTH_VALUE(user_id,3) WITHIN GROUP (ORDER BY last_read_date DESC) as 
> nth3_user_id,
>        count(*)
> FROM  MY_TABLE 
> WHERE tenant_id='00Dx0000000XXXX'
> AND entity_id in ('0D5x000000ABCD','0D5x000000ABCE')
> GROUP BY entity_id;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to