Re: About the IN condition push down.

Julian Hyde Tue, 15 Nov 2016 21:32:54 -0800

I’m not sure what you mean by “notified about the IN condition”. You either 
have to convert it to a semi-join or not.


I don’t know how expensive hash-joins are in Drill. If they’re expensive, you 
could hand-write a UDF that builds a java hash-map, and see whether it performs 
better.

Julian


> On Nov 15, 2016, at 7:38 PM, WeiWan <[email protected]> wrote:
> 
> Hi!
> 
> I’m working on a new data format using Apache Drill as query engine. It has 
> very nice scan speed, high compression ratio, and contains indexes to filter 
> out irrelevant parts. Even more interesting is it supports extremely fast 
> messages realtime ingestion. It is not yet open sourced but soon.
> 
> But right now we run into a issue and need your help. We know Drill can push 
> down predicates to GroupScan. But it may transform the large IN condition 
> into an HashJoin to improve performance. Like this:
> 
> 0: jdbc:drill:drillbit=localhost> explain plan for select user_id from test 
> where user_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20);
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      UnionExchange
> 01-01        Project(user_id=[$0])
> 01-02          HashJoin(condition=[=($1, $2)], joinType=[inner])
> 01-04            Project($f15=[$0], $f69=[$0])
> 01-05              
> Scan(groupscan=[TestGroupScan@4013af3d{Spec=TestScanSpec@73ccb9dc{table:test, 
> rsFilter:null}, columns=[`user_id`]}])
> 01-03            BroadcastExchange
> 02-01              HashAgg(group=[{0}])
> 02-02                Values
> 
> 
> The problem is we need to know there is a IN condition, either by predicates 
> push down or other ways, so that we can do the optimization job. 
> 
> Drill has an option “planner.in_subquery_threshold” can prevent the planner 
> from transforming the IN condition, but we would like to keep that untouched 
> because a large  “or (equals … )” can also slow down things.
> 
> So our question is, if we keep the “in_subquery” planner optimization, is 
> there any ways we can be notified about the IN condition?
> 
> Looking forward to your reply!
> 
> Regards
> Flow Wei
> 
> 
>

Re: About the IN condition push down.

Reply via email to