> On May 9, 2015, 5:57 a.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java,
> >  line 53
> > <https://reviews.apache.org/r/34006/diff/2/?file=954198#file954198line53>
> >
> >     It is not clear why Scan should have a 'distinct' row count of 10% ?  
> > What does 'distinct' row count mean for scan ? (since we are not 
> > considering filter push-down or partition pruning here).
> 
> Jinfeng Ni wrote:
>     The distinct row count is for one set of column. It's more like column 
> cardinality.  For instance, GB c1, c2, c3. The ImmutableBitSet groupKey 
> indicates the column set. However, since Drill does not have any column 
> cardinality, we simply use row count to estimate.
>     
>     As a matter of fact, the current VolcanoPlanner uses the default 
> RelMetadaProvider, which will return 10% of row count for AggregateRel. In 
> this sense, this DistinctRowCount MetadataProvider is using the same way for 
> estimation.

Returning 10% as the distinct row count for AggregateRel seems reasonable since 
it is doing group-by.  I suppose for ScanRel it will be called for individual 
columns, not a set of columns... in other words if I want the NDV(c1), NDV(c2), 
NDV(c3), I will call this method 3 times with different columns in the groupKey.


> On May 9, 2015, 5:57 a.m., Aman Sinha wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java,
> >  line 529
> > <https://reviews.apache.org/r/34006/diff/2/?file=954207#file954207line529>
> >
> >     Is the criteria for using Lopt optimizer (in terms of number of tables 
> > above a certain threshold) applied internally ? We should have a Drill 
> > specific setting for it beyond just a true/false setting.
> 
> Jinfeng Ni wrote:
>     I think it probably makes sense to avoid LOPT planner for single table 
> query. For any query with JOIN, since the current planer does not enable 
> SwapJoin rule in the logical planning phase, it may not find the optimal 
> plan, and rely on a post-planing method to swap join based on rowcount. In 
> that sense, I feel LOPT planner might be a better choice even for 2 or 3 
> tables join.
>     
>     For single table query, since there is no join, it seems no difference 
> between LOPT / the current planner. That's why I did not add a threashold.
> 
> Jinfeng Ni wrote:
>     Another reason that I did not add a threashold option is we have added a 
> new option to swtich between the existing planner and new planner. User could 
> simply turn on/off that option, to completely switch to one of them. Having 
> both the switch option and threashold option probably will cause more 
> confusing, IMHO.

Ok, let's see how the TPC-DS testing goes and see if more control is needed.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34006/#review83134
-----------------------------------------------------------


On May 9, 2015, 12:20 a.m., Jinfeng Ni wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34006/
> -----------------------------------------------------------
> 
> (Updated May 9, 2015, 12:20 a.m.)
> 
> 
> Review request for drill and Aman Sinha.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Drill current use VolcanoPlanner in join planning. This planner has two known 
> issues:
> 
> 1. The search space is increased exponentially with increased # of tables 
> joined. If query has more than > 10 tables join, the planning time itself 
> could be minutes, if not longer.
> 
> 2. Drill did not enable a rule to swap both sides of join, due to the search 
> space problem. We only do a swap join afterwards. See DRILL-2236. This means 
> the join order chosen by Drill's VolcanoPlanner might not be optimal.
> 
> To address the above two issues, we are going to provide another planner for 
> the purpose of join ordering planning. This planner will use a different 
> optimization rules, and the search space is not increased exponentially with 
> # of table. 
> 
> The main logic of this new planner:
> 1) Let VolcanoPlanner do all the rule transformations same as the current 
> planner's logical planning, except for the join permutation rule.
> 2) After that, pass to HepPlanner with Calcite LOPT optimization rule, to let 
> it do the join ordering. Feed with the HepPlanner with Drill's 
> RelMetaDataProvider, to leverage the statistics (rowcount) available in 
> Drill's table/files. 
> 3) Continue with the same physical planning as before.
> 
> With the limited statistics available in Drill, the new planner seems to 
> produce better query plan than the current, for several TPCH queries. 
> 
> Preliminary performance results show this planner run faster than the 
> existing one, and the join plan seems to be same or better than the plan 
> chosen by the existing planner. 
> 
> Will update more in detail about the comparison.
> 
> 
> Diffs
> -----
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillJoinRelBase.java
>  5ab416c 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java
>  42ef6ac 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillDefaultRelMetadataProvider.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterRel.java
>  dbd08f4 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillJoinRel.java
>  dcccdb0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillProjectRel.java
>  6e132aa 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjIntoScan.java
>  2981de8 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRelFactories.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  53e1bff 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
>  7d8dd97 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
>  3c78c08 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
>  eda1b5f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  4d8b034 
> 
> Diff: https://reviews.apache.org/r/34006/diff/
> 
> 
> Testing
> -------
> 
> Unit test / Regression suite.
> 
> 
> Thanks,
> 
> Jinfeng Ni
> 
>

Reply via email to