[ 
https://issues.apache.org/jira/browse/DRILL-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529465#comment-16529465
 ] 

ASF GitHub Bot commented on DRILL-6557:
---------------------------------------

arina-ielchiieva opened a new pull request #1357: DRILL-6557: Use size in bytes 
during Hive statistics calculation if present
URL: https://github.com/apache/drill/pull/1357
 
 
   1. Check size in bytes presence in stats before fetching input splits and 
use it if present.
   2. Add log trace suggesting to use ANALYZE command before running queries if 
statistics is unavailable and Drill had to fetch all input splits.
   3. Minor refactoring /  cleanup in HiveMetadataProvider class.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Use size in bytes during Hive statistics calculation if present
> ---------------------------------------------------------------
>
>                 Key: DRILL-6557
>                 URL: https://issues.apache.org/jira/browse/DRILL-6557
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Drill considers Hive statistics valid if it contains number of rows and size 
> in bytes. If at least of them is absent, statistics is calculated based on 
> input splits size in bytes. This means that we fetch all input splits though 
> we might not need some after planning optimizations (ex: partition pruning). 
> Though if number of rows are missing and size in bytes is present, there is 
> no need to fetch all input splits since their size in bytes will be the same 
> as in statistics, this would improve time planning since fetching input 
> splits is rather costly operation.
> This Jira aims to:
>  1. check size in bytes presence in stats before fetching input splits and 
> use it if present;
>  2. add log trace suggesting to use ANALYZE command before running queries if 
> statistics is unavailable and Drill had to fetch all input splits;
>  3. minor refactoring /  cleanup in HiveMetadataProvider class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to