[ 
https://issues.apache.org/jira/browse/PHOENIX-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139166#comment-15139166
 ] 

James Taylor commented on PHOENIX-2665:
---------------------------------------

As suspected, this can occur with any aggregate that is grouped by the primary 
key constraint (look for {{SERVER AGGREGATE INTO ORDERED DISTINCT ROWS}}). It's 
not a regression and it's not limited to only when an index is used. 

FWIW, it'd be unusual to group by the entire PK as you might as well not do the 
group by since every row is it's own group. Grouping by the leading part of the 
PK would be more common, though (which would use the same optimization).

Do you have a patch in the works, [~rajeshbabu]?

> index split while running group by query is returning duplicate results
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2665
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2665
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Blocker
>             Fix For: 4.7.0
>
>
> When there is a index split while running group by query is returning 
> duplicate results.
> Instead of returning 500,000 records it's returning 729,500 records.
> {noformat}
> +------------------------------------------+------------------------------------------+
> | 4999                                     | 499999                           
>         |
> +------------------------------------------+------------------------------------------+
> 500,000 rows selected (11.996 seconds)
> {noformat}
> {noformat}
> +------------------------------------------+------------------------------------------+
> | 4999                                     | 499999                           
>         |
> +------------------------------------------+------------------------------------------+
> 729,500 rows selected (15.291 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to