[ https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505073#comment-17505073 ]
Lars Hofhansl commented on PHOENIX-6501: ---------------------------------------- With the latest version of this patch the query finishes, but it still take a very long time. Same scenario: 18m rows, count\(*) matching 2m rows. Without index the query takes about 7s on my system, with a local index it takes 10s, with the global index and this patch it takes about 4 minutes. A cursory look in the profiler reveals that most time is spent in ClientScanner.next()... >From the code it's not immediately clear what the problem is. > Use batching when joining data table rows with uncovered global index rows > -------------------------------------------------------------------------- > > Key: PHOENIX-6501 > URL: https://issues.apache.org/jira/browse/PHOENIX-6501 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 5.1.2 > Reporter: Kadir Ozdemir > Assignee: Lars Hofhansl > Priority: Major > Attachments: PHOENIX-6501.master.001.patch > > > PHOENIX-6458 extends the existing uncovered local index support for global > indexes. The current solution uses HBase get operations to join data table > rows with uncovered index rows on the server side. Doing a separate RPC call > for every data table row can be expensive. Instead, we can buffer lots of > data row keys in memory, use a skip scan filter and even multiple threads to > issue a separate scan for each data table region in parallel. This will > reduce the cost of join and also improve the performance. -- This message was sent by Atlassian Jira (v8.20.1#820001)