[ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105520#comment-14105520
 ] 

Maryann Xue commented on PHOENIX-852:
-------------------------------------

bq. What about the case where the RHS has been filtered down a lot and you have 
a fully qualified key? Then a full scan over the LHS will be much worse than a 
skip scan driven by the keys formed through the RHS rows. I think this may be 
the most common case.

I don't think there's a silver bullet to this problem here before we have 
stats, and I assume the goal right now is trying to be relatively conservative 
at this stage. so why don't we just check-in now and go with "by default we do 
BETWEEN-AND for full key match (e.g. c1,c2,c3 matched in c1,c2,c3), but only IN 
clause if the SKIP_SCAN_HASH_JOIN hint is on."? At least people can start using 
this feature optimizing their queries and in some cases they'll have to be 
aware of the hints to do even better.

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>         Attachments: 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to