James Taylor commented on PHOENIX-1556:

bq. Yes. stripSkipScanFilter() also aims to eliminate things like PageFilter 
and looks to keep only boolean expression filters that cannot be pushed into PK.
One thing with PageFilter is that it represents the limit pushed down to the 
server. Since the limit cannot always be pushed down (depending on the query - 
for example an aggregate query can push down the limit only if it's aggregating 
on the leading part of the pk), should we consider that? Or do you think we can 
reliably get the limit that's pushed to the server from the query plan?

bq. A probably more realistic approach here might be to set a configurable 
"limit" for specific operators
That's a good idea. I'll file a JIRA and copy/paste your explanation there.

+1 to the patch (assuming tests pass locally for you -- FYI test with the 
4.x-HBase-1.3 branch as there are test failures in master). Great work!

> Base hash versus sort merge join decision on cost
> -------------------------------------------------
>                 Key: PHOENIX-1556
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1556
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>            Priority: Major
>              Labels: CostBasedOptimization
>         Attachments: PHOENIX-1556.patch
> At compile time, we know how many guideposts (i.e. how many bytes) will be 
> scanned for the RHS table. We should, by default, base the decision of using 
> the hash-join verus many-to-many join on this information.
> Another criteria (as we've seen in PHOENIX-4508) is whether or not the tables 
> being joined are already ordered by the join key. In that case, it's better 
> to always use the sort merge join.

This message was sent by Atlassian JIRA

Reply via email to