[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

Lars Hofhansl (Jira) Mon, 07 Mar 2022 13:24:06 -0800


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17502556#comment-17502556
 ]


Lars Hofhansl commented on PHOENIX-6501:
----------------------------------------

That might be a bit tricky. I loaded the TPCH lineitem table (scale factor 3) 
into Phoenix via the Trino connector.

{code}
CREATE TABLE phoenix.default.lineitem (
orderkey bigint NOT NULL,
partkey bigint,
suppkey bigint,
linenumber integer NOT NULL,
quantity double,
extendedprice double,
discount double,
tax double,
returnflag varchar(1),
linestatus varchar(1),
shipdate date,
commitdate date,
receiptdate date,
shipinstruct varchar(25),
shipmode varchar(10),
comment varchar(44)
)
WITH (
compression = 'ZSTD',
data_block_encoding = 'ROW_INDEX_V1',
disable_wal = true,
immutable_rows = true,
rowkeys = 'ORDERKEY,LINENUMBER'
)
{code}

(I do disable WAL everywhere, because that's not what I am testing and it 
speeds up loading/creating)

Then I created the global index on the tax column.
{{create index g_l_tax on lineitem(tax) DISABLE_WAL=true;}}

Then I ran {{select /*+ INDEX(lineitem g_l_tax) */ count(suppkey) from lineitem 
where tax = 0.08}}

Let me connect with you offline and see if I can send you a CSV with the 
lineitem data.


> Use batching when joining data table rows with uncovered index rows
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-6501
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6501
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.1.2
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>         Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

Reply via email to