[ 
https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625342#comment-16625342
 ] 

Lars Hofhansl commented on PHOENIX-4594:
----------------------------------------

Can you explain your patch in some words, or - even better - place a few more 
comments in the code?

The main problem - as I see it - with the current guideposts is that the number 
of scans that are generated and executed by a query a locked with the number of 
guideposts that exists.
Rather we should look at the guideposts as source of information, and use that 
to plan the query at a compile time.

As an example... Why can't we 10MB guideposts and 1PB table? Currently we'd 
1024^5 / 10 / 1024^2 ~= 100m scans. Instead we should have a parallelism 
target, and combine guideposts as needed.

I think your patch does that by going through the guideposts in a moving 
window, but with a brief glance, I am not sure.

The other problem is that all clients are required to cache all guideposts. 
Here, too, we should look at the guideposts as data and group as needed.

As super simple solution would be to always group N guideposts together, where 
N might be configurable. That's pretty, but would be a first step to 
disentangle the guidepost data from the actually materialized scans.


> Perform binary search on guideposts during query compilation
> ------------------------------------------------------------
>
>                 Key: PHOENIX-4594
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4594
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Bin Shi
>            Priority: Major
>         Attachments: PHOENIX-4594-0913.patch, PHOENIX-4594_0917.patch, 
> PHOENIX-4594_0918.patch
>
>
> If there are many guideposts, performance will suffer during query 
> compilation because we do a linear search of the guideposts to find the 
> intersection with the scan ranges. Instead, in 
> BaseResultIterators.getParallelScans() we should populate an array of 
> guideposts and perform a binary search. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to