[ https://issues.apache.org/jira/browse/PHOENIX-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387719#comment-16387719 ]
Sokolov Yura commented on PHOENIX-4336: --------------------------------------- catched with same issue. > SELECT + ORDER BY performance much worse than self-join > ------------------------------------------------------- > > Key: PHOENIX-4336 > URL: https://issues.apache.org/jira/browse/PHOENIX-4336 > Project: Phoenix > Issue Type: Bug > Reporter: Marcin Januszkiewicz > Priority: Major > > We have a wide table with 100M records created with the following DDL: > {code:sql} > CREATE TABLE traces ( > rowkey VARCHAR PRIMARY KEY, > time VARCHAR, > number VARCHAR, > +40 more columns) > CREATE LOCAL INDEX ix_0 ON traces (UPPER(number)) INCLUDE (time, + some other > columns used for filtering) > {code} > We want to select into a large (~30M records) subset of this data with the > query: > {code:sql} > SELECT *all columns* > FROM traces > WHERE (UPPER(number) LIKE 'PO %') > ORDER BY time DESC, ROWKEY > LIMIT 101; > {code} > {noformat} > +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ > | PLAN > | EST_BYTES_READ | > EST_ROWS_READ | EST_INFO_TS | > +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ > | CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY RANGE > SCAN OVER TRACES [1,'PO '] - [1,'PO!'] | 73610048115 > | 39850892 | 1509102519122 | > | SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"] > | 73610048115 | > 39850892 | 1509102519122 | > | CLIENT MERGE SORT > | 73610048115 | > 39850892 | 1509102519122 | > | CLIENT LIMIT 101 > | 73610048115 | > 39850892 | 1509102519122 | > +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ > {noformat} > This times out after 15 minutes and puts a huge load on our cluster. > We have an alternate way of selecting this data: > {code:sql} > SELECT t.rowkey, *all columns* > FROM TRACES t > JOIN ( > SELECT rowkey > FROM TRACES > WHERE (UPPER(number) LIKE 'PO %') > ORDER BY time DESC, ROWKEY > LIMIT 101 > ) ix > ON t.ROWKEY = ix.ROWKEY > order by t.ROWKEY; > {code} > {noformat} > +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+ > | PLAN > | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+ > | CLIENT 1500-CHUNK 97154640 ROWS 462422573830 BYTES PARALLEL 5-WAY FULL SCAN > OVER TRACES | 73610048115 > | 39850892 | 150910251 | > | CLIENT MERGE SORT > | 73610048115 > | 39850892 | 150910251 | > | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) > | 73610048115 > | 39850892 | 150910251 | > | CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY > RANGE SCAN OVER TRACES [1,'PO '] - [1,'PO!'] | > 73610048115 | 39850892 | 150910251 | > | SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"] > | 73610048115 > | 39850892 | 150910251 | > | CLIENT MERGE SORT > | 73610048115 > | 39850892 | 150910251 | > | CLIENT LIMIT 101 > | 73610048115 > | 39850892 | 150910251 | > | DYNAMIC SERVER FILTER BY T.ROWKEY IN (IX.ROWKEY) > | 73610048115 > | 39850892 | 150910251 | > +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+ > {noformat} > Which completes in just under a minute. -- This message was sent by Atlassian JIRA (v7.6.3#76005)