[ 
https://issues.apache.org/jira/browse/HIVE-25588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441018#comment-17441018
 ] 

Zhihua Deng commented on HIVE-25588:
------------------------------------

Have you recorded the statistics(totalSize) per partition?  also HIVE-19334 
uses actual file size for the external table,  this ticket may be helpful.

> Hive 2.3.3 Fetch Task threshold not respected
> ---------------------------------------------
>
>                 Key: HIVE-25588
>                 URL: https://issues.apache.org/jira/browse/HIVE-25588
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 2.3.3
>            Reporter: Nedzad Campara
>            Priority: Major
>              Labels: fetch, optimizer, simplefetchoptimizer
>
> So it seems that "hive.fetch.task.conversion.threshold" is not respected in 
> Hive 2.3.3, and basically it will always do a Fetch Task, irrelevant of the 
> input size, as long as the conditions are met for either "more" or "minimal" 
> setting of "hive.fetch.task.conversion".
> Apologies if this has been reported already, but I could not find any issues 
> which mention this specifically.
> The way to reproduce is to set "hive.fetch.task.conversion.threshold=1", 
> which to my understanding should basically always trigger an MR/Tez job, but 
> it does not, and instead does a fetch task.
> Tested on various tables from dozens of GB in size to dozens of TBs  in size 
> with hundreds and thousands partitions, in ORC and Parquet format. Example 
> table size from statistics:
> | Table Parameters: | NULL | NULL |
> | | EXTERNAL | TRUE |
> | | numFiles | 234258 |
> | | numPartitions | 171898 |
> | | numRows | 1719836838331 |
> | | rawDataSize | 515766839727247 |
> | | totalSize | 189367471403333 | 
> Please let me know if any additional information is required.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to