[
https://issues.apache.org/jira/browse/DRILL-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pritesh Maker updated DRILL-6557:
---------------------------------
Labels: ready-to-commit (was: )
> Use size in bytes during Hive statistics calculation if present
> ---------------------------------------------------------------
>
> Key: DRILL-6557
> URL: https://issues.apache.org/jira/browse/DRILL-6557
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.13.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> Drill considers Hive statistics valid if it contains number of rows and size
> in bytes. If at least of them is absent, statistics is calculated based on
> input splits size in bytes. This means that we fetch all input splits though
> we might not need some after planning optimizations (ex: partition pruning).
> Though if number of rows are missing and size in bytes is present, there is
> no need to fetch all input splits since their size in bytes will be the same
> as in statistics, this would improve time planning since fetching input
> splits is rather costly operation.
> This Jira aims to:
> 1. check size in bytes presence in stats before fetching input splits and
> use it if present;
> 2. add log trace suggesting to use ANALYZE command before running queries if
> statistics is unavailable and Drill had to fetch all input splits;
> 3. minor refactoring / cleanup in HiveMetadataProvider class.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)