[
https://issues.apache.org/jira/browse/PHOENIX-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388161#comment-16388161
]
James Taylor commented on PHOENIX-4336:
---------------------------------------
The only difference in processing of these two queries is that:
- the first query will do a point get for each local index row on the server to
find the data columns to return back to the client.
- the second query would initiate client-side batched point lookups (using
SkipScanFilter) to retrieve the data rows based on the rows it gets back from
the inner index query.
Any thoughts, [~rajeshbabu] or [~sergey.soldatov]?
> SELECT + ORDER BY performance much worse than self-join
> -------------------------------------------------------
>
> Key: PHOENIX-4336
> URL: https://issues.apache.org/jira/browse/PHOENIX-4336
> Project: Phoenix
> Issue Type: Bug
> Reporter: Marcin Januszkiewicz
> Priority: Major
>
> We have a wide table with 100M records created with the following DDL:
> {code:sql}
> CREATE TABLE traces (
> rowkey VARCHAR PRIMARY KEY,
> time VARCHAR,
> number VARCHAR,
> +40 more columns)
> CREATE LOCAL INDEX ix_0 ON traces (UPPER(number)) INCLUDE (time, + some other
> columns used for filtering)
> {code}
> We want to select into a large (~30M records) subset of this data with the
> query:
> {code:sql}
> SELECT *all columns*
> FROM traces
> WHERE (UPPER(number) LIKE 'PO %')
> ORDER BY time DESC, ROWKEY
> LIMIT 101;
> {code}
> {noformat}
> +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | PLAN
> | EST_BYTES_READ |
> EST_ROWS_READ | EST_INFO_TS |
> +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY RANGE
> SCAN OVER TRACES [1,'PO '] - [1,'PO!'] | 73610048115
> | 39850892 | 1509102519122 |
> | SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"]
> | 73610048115 |
> 39850892 | 1509102519122 |
> | CLIENT MERGE SORT
> | 73610048115 |
> 39850892 | 1509102519122 |
> | CLIENT LIMIT 101
> | 73610048115 |
> 39850892 | 1509102519122 |
> +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> {noformat}
> This times out after 15 minutes and puts a huge load on our cluster.
> We have an alternate way of selecting this data:
> {code:sql}
> SELECT t.rowkey, *all columns*
> FROM TRACES t
> JOIN (
> SELECT rowkey
> FROM TRACES
> WHERE (UPPER(number) LIKE 'PO %')
> ORDER BY time DESC, ROWKEY
> LIMIT 101
> ) ix
> ON t.ROWKEY = ix.ROWKEY
> order by t.ROWKEY;
> {code}
> {noformat}
> +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
> | PLAN
> |
> EST_BYTES_READ | EST_ROWS_READ | EST_INFO |
> +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
> | CLIENT 1500-CHUNK 97154640 ROWS 462422573830 BYTES PARALLEL 5-WAY FULL SCAN
> OVER TRACES | 73610048115
> | 39850892 | 150910251 |
> | CLIENT MERGE SORT
> | 73610048115
> | 39850892 | 150910251 |
> | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)
> | 73610048115
> | 39850892 | 150910251 |
> | CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY
> RANGE SCAN OVER TRACES [1,'PO '] - [1,'PO!'] |
> 73610048115 | 39850892 | 150910251 |
> | SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"]
> | 73610048115
> | 39850892 | 150910251 |
> | CLIENT MERGE SORT
> | 73610048115
> | 39850892 | 150910251 |
> | CLIENT LIMIT 101
> | 73610048115
> | 39850892 | 150910251 |
> | DYNAMIC SERVER FILTER BY T.ROWKEY IN (IX.ROWKEY)
> | 73610048115
> | 39850892 | 150910251 |
> +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
> {noformat}
> Which completes in just under a minute.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)