[
https://issues.apache.org/jira/browse/PHOENIX-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241957#comment-17241957
]
Xinyi Yan commented on PHOENIX-4504:
------------------------------------
HIi [~wangchao316], can you reproduce this issue? Would you mind post the steps
here? Thanks!
> Subquery with ORDER BY on salted table gives wrong results
> ----------------------------------------------------------
>
> Key: PHOENIX-4504
> URL: https://issues.apache.org/jira/browse/PHOENIX-4504
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.11.0
> Environment: amazon emr phoenix 4.11.0 hbase 1.3
> Reporter: Sokolov Yura
> Priority: Major
>
> Probably it is already fixed. Having a quick search I didn't find exact
> problem description.
> I have a table:
> {code:sql}
> create immutable table product_history_v3 (
> ts bigint not null,
> id varchar not null,
> product varchar,
> merchantid varchar,
> storeid varchar,
> constraint pk primary key (ts, id)
> ) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000,
> versions=1,update_cache_frequency=1000,append_only_schema=true,
> guid_posts_width=10000000,
> SALT_BUCKETS=20;
> create local index product_history_v3_id_ts on product_history_v3 (id, ts)
> compression=LZ4;
> create local index product_history_v3_merchantid_ts on product_history_v3
> (merchantid, ts) include (id) compression=LZ4;
> create local index product_history_v3_storeid_ts on product_history_v3
> (storeid, ts) include (id) compression=LZ4;
> {code}
> Simple select by merchanid ordering by id,ts returns correct results:
> {code:sql}
> 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from
> product_history_v3 where merchantid =
> '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts >
> 1498867200000 order by id, ts limit 30;
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |
> PLAN
> |
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER
> PRODUCT_HISTORY_V3
> [2,'1479114284851799852-2-11-118-1577502676',1498867200001] -
> [2,'1479114284851799852-2-11-118-1577502676',1499472000000] |
> | SERVER FILTER BY FIRST KEY ONLY
>
> |
> | SERVER TOP 30 ROWS SORTED BY ["ID", "TS"]
>
> |
> | CLIENT MERGE SORT
>
> |
> | CLIENT LIMIT 30
>
> |
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 5 rows selected (0,019 seconds)
> {code}
> It runs very fast until I add {{product}} to selected fields (cause average
> length of {{product}} is 10kb).
> So I'm trying to fetch id,ts in subquery, and product in outer query. It runs
> fast, but returns incorrect results: set of rows doesn't match to set of rows
> returned by query above.
> {code}
> 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts,
> substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts
> from product_history_v3 where merchantid =
> '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts >
> 1498867200000 order by id, ts limit 30) order by id, ts limit 30;
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |
> PLAN
> |
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER
> PRODUCT_HISTORY_V3
> | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID,
> PRODUCT_HISTORY_V3.TS]
> | CLIENT MERGE SORT
> | CLIENT LIMIT 30
> | SKIP-SCAN-JOIN TABLE 0
> | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER
> PRODUCT_HISTORY_V3
> [2,'1479114284851799852-2-11-118-1577502676',1498867200001] -
> [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
> | SERVER FILTER BY FIRST KEY ONLY
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"]
> LIMIT 30 GROUPS
> | CLIENT MERGE SORT
> | CLIENT 30 ROW LIMIT
> | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID)
> IN (($470.$473, $470.$472))
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 11 rows selected (0,021 seconds)
> {code}
> However, if I change ordering a bit, so planner is forced for reordering,
> then set of rows is equal to original query:
> {code}
> 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts,
> substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts
> from product_history_v3 where merchantid =
> '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts >
> 1498867200000 order by id||'-', ts limit 30) order by id, ts limit 30;
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |
> PLAN
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER
> PRODUCT_HISTORY_V3
> | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID,
> PRODUCT_HISTORY_V3.TS]
> | CLIENT MERGE SORT
> | CLIENT LIMIT 30
> | SKIP-SCAN-JOIN TABLE 0
> | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER
> PRODUCT_HISTORY_V3
> [2,'1479114284851799852-2-11-118-1577502676',1498867200001] -
> [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
> | SERVER FILTER BY FIRST KEY ONLY
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"]
> | CLIENT MERGE SORT
> | CLIENT TOP 30 ROWS SORTED BY [("ID" || '-'), "TS"]
> | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID)
> IN (($494.$497, $494.$496))
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
> 11 rows selected (0,02 seconds)
> 12 rows selected (0,021 seconds)
> {code}
> There, certainly, should be a lot of rows to trigger this behaviour.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)