[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

chenglei (JIRA) Wed, 16 Nov 2016 03:45:32 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670234#comment-15670234
 ]


chenglei edited comment on PHOENIX-3451 at 11/16/16 11:44 AM:
--------------------------------------------------------------

[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-                        /*
-                         * When a GROUP BY is not order preserving, we cannot 
do a reverse
-                         * scan to eliminate the ORDER BY since our 
server-side scan is not
-                         * ordered in that case.
-                         */
-                        if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-                            isOrderPreserving = false;
-                            isReverse = false;
-                            return;
-                        }
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
            CREATE TABLE ORDERBY_TEST ( 
                    ORGANIZATION_ID INTEGER NOT NULL,
                    CONTAINER_ID INTEGER NOT NULL,
                    SCORE INTEGER NOT NULL,
                    ENTITY_ID INTEGER NOT NULL,     
                   CONSTRAINT TEST_PK PRIMARY KEY ( 
                    ORGANIZATION_ID,
                    CONTAINER_ID,
                    SCORE,
                    ENTITY_ID
                    )) split on(4);
         
            UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
            UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
            UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
            UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
            UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
            UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
            SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  group by 
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
           
{code}

expecting results are:
{code:borderStyle=solid}
      6,6
      5,5
      4,4
      3,3
      2,2
      1,1
{code}

but the actual results are:
{code:borderStyle=solid}
    4,4
    5,5
    6,6
    1,1
    2,2
    3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232            aggResultIterator = new GroupedAggregatingResultIterator(
233                    new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

So if  the AggregatePlan is not modified, just remove out the above code may 
cause problem. Maybe I can open a new JIRA to fix this
problem if the JIRA does not  exist,because it is irrelevant to PHOENIX-3451



was (Author: comnetwork):
[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another 
almost ready JIRA?
{code:borderStyle=solid} 
-                        /*
-                         * When a GROUP BY is not order preserving, we cannot 
do a reverse
-                         * scan to eliminate the ORDER BY since our 
server-side scan is not
-                         * ordered in that case.
-                         */
-                        if (!groupBy.isEmpty() && 
!groupBy.isOrderPreserving()) {
-                            isOrderPreserving = false;
-                            isReverse = false;
-                            return;
-                        }
{code} 

 It seems for current master branch, removing theses lines may cause some 
problem , which can be reproduced as follows :

{code:borderStyle=solid}
            CREATE TABLE ORDERBY_TEST ( 
                    ORGANIZATION_ID INTEGER NOT NULL,
                    CONTAINER_ID INTEGER NOT NULL,
                    SCORE INTEGER NOT NULL,
                    ENTITY_ID INTEGER NOT NULL,     
                   CONSTRAINT TEST_PK PRIMARY KEY ( 
                    ORGANIZATION_ID,
                    CONTAINER_ID,
                    SCORE,
                    ENTITY_ID
                    )) split on(4);
         
            UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
            UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
            UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
            UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
            UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
            UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
            SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST  group by 
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
           
{code}

expecting results are:
{code:borderStyle=solid}
      6,6
      5,5
      4,4
      3,3
      2,2
      1,1
{code}

but the actual results are:
{code:borderStyle=solid}
    4,4
    5,5
    6,6
    1,1
    2,2
    3,3
{code}

The problem is caused by the AggregatePlan, when the above code was removed, 
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because 
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but 
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after 
geting results from RegionServer at the client side, which is a ASC order, the 
sorted results  are [1,1  2,2 3,3] and [4,4 5,5 6,6] , after executeing the 
following code , the  result is  :[4,4 5,5 6,6 1,1 2,2 3,3], and because the 
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final 
result is incorrect.

{code:borderStyle=solid}
232            aggResultIterator = new GroupedAggregatingResultIterator(
233                    new MergeSortRowKeyResultIterator(iterators, 0, 
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}

So if  the AggregatePlan is not modified, just remove out the above code may 
cause problem. Maybe I can open a new JIRA to fix this
problem if the JIRA does not  exist.


> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -----------------------------------------------------------------------------
>
>                 Key: PHOENIX-3451
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3451
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.0
>            Reporter: Joel Palmert
>            Assignee: chenglei
>         Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
>     ORGANIZATION_ID CHAR(15) NOT NULL,
>     CONTAINER_ID CHAR(15) NOT NULL,
>     ENTITY_ID CHAR(15) NOT NULL,
>     SCORE DOUBLE,
>     CONSTRAINT TEST_PK PRIMARY KEY (
>         ORGANIZATION_ID,
>         CONTAINER_ID,
>         ENTITY_ID
>     )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId5    1.2
> entityId3    1.4
> The expected out out would be
> entityId8    1.45
> entityId3    1.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

Reply via email to