[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505073#comment-17505073
 ] 

Lars Hofhansl commented on PHOENIX-6501:
----------------------------------------

With the latest version of this patch the query finishes, but it still take a 
very long time.

Same scenario: 18m rows, count\(*) matching 2m rows. Without index the query 
takes about 7s on my system, with a local index it takes 10s, with the global 
index and this patch it takes about 4 minutes. A cursory look in the profiler 
reveals that most time is spent in ClientScanner.next()...

>From the code it's not immediately clear what the problem is.

> Use batching when joining data table rows with uncovered global index rows
> --------------------------------------------------------------------------
>
>                 Key: PHOENIX-6501
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6501
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.1.2
>            Reporter: Kadir Ozdemir
>            Assignee: Lars Hofhansl
>            Priority: Major
>         Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to