[
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670234#comment-15670234
]
chenglei edited comment on PHOENIX-3451 at 11/16/16 11:44 AM:
--------------------------------------------------------------
[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another
almost ready JIRA?
{code:borderStyle=solid}
- /*
- * When a GROUP BY is not order preserving, we cannot
do a reverse
- * scan to eliminate the ORDER BY since our
server-side scan is not
- * ordered in that case.
- */
- if (!groupBy.isEmpty() &&
!groupBy.isOrderPreserving()) {
- isOrderPreserving = false;
- isReverse = false;
- return;
- }
{code}
It seems for current master branch, removing theses lines may cause some
problem , which can be reproduced as follows :
{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST (
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
{code}
expecting results are:
{code:borderStyle=solid}
6,6
5,5
4,4
3,3
2,2
1,1
{code}
but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}
The problem is caused by the AggregatePlan, when the above code was removed,
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after
geting results from RegionServer at the client side, which is a ASC order, the
sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the
following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final
result is incorrect.
{code:borderStyle=solid}
232 aggResultIterator = new GroupedAggregatingResultIterator(
233 new MergeSortRowKeyResultIterator(iterators, 0,
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}
So if the AggregatePlan is not modified, just remove out the above code may
cause problem. Maybe I can open a new JIRA to fix this
problem if the JIRA does not exist,because it is irrelevant to PHOENIX-3451
was (Author: comnetwork):
[~jamestaylor], I have a problem with your patch:
Why did you remove out the following lines,or may be you want to fix another
almost ready JIRA?
{code:borderStyle=solid}
- /*
- * When a GROUP BY is not order preserving, we cannot
do a reverse
- * scan to eliminate the ORDER BY since our
server-side scan is not
- * ordered in that case.
- */
- if (!groupBy.isEmpty() &&
!groupBy.isOrderPreserving()) {
- isOrderPreserving = false;
- isReverse = false;
- return;
- }
{code}
It seems for current master branch, removing theses lines may cause some
problem , which can be reproduced as follows :
{code:borderStyle=solid}
CREATE TABLE ORDERBY_TEST (
ORGANIZATION_ID INTEGER NOT NULL,
CONTAINER_ID INTEGER NOT NULL,
SCORE INTEGER NOT NULL,
ENTITY_ID INTEGER NOT NULL,
CONSTRAINT TEST_PK PRIMARY KEY (
ORGANIZATION_ID,
CONTAINER_ID,
SCORE,
ENTITY_ID
)) split on(4);
UPSERT INTO ORDERBY_TEST VALUES (1,1,1,1);
UPSERT INTO ORDERBY_TEST VALUES (2,2,2,2);
UPSERT INTO ORDERBY_TEST VALUES (3,3,3,3);
UPSERT INTO ORDERBY_TEST VALUES (4,4,4,4);
UPSERT INTO ORDERBY_TEST VALUES (5,5,5,5);
UPSERT INTO ORDERBY_TEST VALUES (6,6,6,6);
SELECT ORGANIZATION_ID,SCORE FROM ORDERBY_TEST group by
ORGANIZATION_ID, SCORE ORDER BY ORGANIZATION_ID DESC, SCORE DESC
{code}
expecting results are:
{code:borderStyle=solid}
6,6
5,5
4,4
3,3
2,2
1,1
{code}
but the actual results are:
{code:borderStyle=solid}
4,4
5,5
6,6
1,1
2,2
3,3
{code}
The problem is caused by the AggregatePlan, when the above code was removed,
the OrderByCompiler thinks OrderBy is OrderBy.REV_ROW_KEY_ORDER_BY, and because
the GroupBy's "isOrderPreserving" is false, so although the Scan is reverse,but
AggregatePlan will sorts the aggregated Key [ORGANIZATION_ID, SCORE] after
geting results from RegionServer at the client side, which is a ASC order, the
sorted results are [1,1 2,2 3,3] and [4,4 5,5 6,6] , after executeing the
following code , the result is :[4,4 5,5 6,6 1,1 2,2 3,3], and because the
OrderBy is compiled out(which is OrderBy.REV_ROW_KEY_ORDER_BY),so the final
result is incorrect.
{code:borderStyle=solid}
232 aggResultIterator = new GroupedAggregatingResultIterator(
233 new MergeSortRowKeyResultIterator(iterators, 0,
this.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY),aggregators);
{code}
So if the AggregatePlan is not modified, just remove out the above code may
cause problem. Maybe I can open a new JIRA to fix this
problem if the JIRA does not exist.
> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -----------------------------------------------------------------------------
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.8.0
> Reporter: Joel Palmert
> Assignee: chenglei
> Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC,
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId5 1.2
> entityId3 1.4
> The expected out out would be
> entityId8 1.45
> entityId3 1.4
> You will get the expected output if you remove the secondary index from the
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the
> ordering is not correct. However, the 2first results in that ordering is
> still not the onces returned by the limit clause, which makes me think there
> are multiple issues here and why I filed both separately. The rows being
> returned are the ones assigned to container1. It looks like Phoenix is first
> getting the rows from the first container and when it finds that to be enough
> it stops the scan. What it should be doing is getting 2 results for each
> container and then merge then and then limit again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)