[jira] [Created] (DRILL-6557) Use size in bytes during Hive statistics calculation if present

Arina Ielchiieva (JIRA) Fri, 29 Jun 2018 08:25:43 -0700

Arina Ielchiieva created DRILL-6557:
---------------------------------------


             Summary: Use size in bytes during Hive statistics calculation if 
present
                 Key: DRILL-6557
                 URL: https://issues.apache.org/jira/browse/DRILL-6557
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.13.0
            Reporter: Arina Ielchiieva
            Assignee: Arina Ielchiieva
             Fix For: 1.14.0


Drill considers Hive statistics valid if it contains number of rows and size in 
bytes. If at least of them is absent, statistics is calculated based on input 
splits size in bytes. This means that we fetch all input splits though we might 
not need some after planning optimizations (ex: partition pruning). Though if 
number of rows are missing and size in bytes is present, there is no need to 
fetch all input splits since their size in bytes will be the same as in 
statistics, this would improve time planning since fetching input splits is 
rather costly operation.

This Jira aims to:
 1. check size in bytes presence in stats before fetching input splits and use 
it if present;
 2. add log debug suggesting to use ANALYZE command before running queries if 
statistics is unavailable and Drill had to fetch all input splits;
 3. minor refactoring /  cleanup in HiveMetadataProvider class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6557) Use size in bytes during Hive statistics calculation if present

Reply via email to