[
https://issues.apache.org/jira/browse/DRILL-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha updated DRILL-2553:
------------------------------
Fix Version/s: (was: 1.2.0)
1.3.0
> Cost calculation fails to properly choose single file scan in favor of a
> multi-file scan when files are small
> -------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-2553
> URL: https://issues.apache.org/jira/browse/DRILL-2553
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.8.0
> Reporter: Jason Altekruse
> Assignee: Aman Sinha
> Fix For: 1.3.0
>
>
> There is a failing test case in the patch for constant folding that should be
> checked in soon. The test attempts to prune out one directory of a scan after
> a constant expression returning the name of a directory is folded, but the
> files being read from both directories are very small. Our current method of
> calculating cost makes the pruned and unpruned plans report the same cost.
> This could be fixed in a few different locations,
> EasyGroupScan.getScanStats() being used here could factor the file count into
> its calculation of the total row count. We also could move to a two part
> metric to track the number of files, instead of just an estimated row count.
> This would require some changes in the cost calculation of the scan rels
> themselves which use the information from the scan stats. I think in general
> we should consider solving this as high up as possible, as we want to make as
> optimal cost estimates as possible, even if the information provided from
> storage plugins is not completely accurate. For example, even disregarding
> the row count reported by EasyGroupScan, the rel nodes have knowledge of the
> number of partitions. It seems like at this level we should be able to avoid
> picking the plan that has a superset of the partitions of the other possible
> plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)