[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245953#comment-17245953
 ] 

Piotr Findeisen commented on HIVE-11266:
----------------------------------------

{quote}This is not just external tables - any tables where users are directly 
modifying the underlying data can be impacted by this.
{quote}
 
{quote}Yes, I agree with you, external table is just my personal use 
case.{quote}
 
[~tmgstev] [~simobatt] was there a follow-up issue to this?
>From the attached patch (same as 
>[https://github.com/apache/hive/commit/a2dff9e13acc62ecc0388b3b2e221f26c9184dbb)]
> i see this was fixed for external tables only.
 

> count(*) wrong result based on table statistics for external tables
> -------------------------------------------------------------------
>
>                 Key: HIVE-11266
>                 URL: https://issues.apache.org/jira/browse/HIVE-11266
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Simone Battaglia
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Blocker
>             Fix For: 3.0.0
>
>         Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to