[
https://issues.apache.org/jira/browse/HIVE-25588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441018#comment-17441018
]
Zhihua Deng commented on HIVE-25588:
------------------------------------
Have you recorded the statistics(totalSize) per partition? also HIVE-19334
uses actual file size for the external table, this ticket may be helpful.
> Hive 2.3.3 Fetch Task threshold not respected
> ---------------------------------------------
>
> Key: HIVE-25588
> URL: https://issues.apache.org/jira/browse/HIVE-25588
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 2.3.3
> Reporter: Nedzad Campara
> Priority: Major
> Labels: fetch, optimizer, simplefetchoptimizer
>
> So it seems that "hive.fetch.task.conversion.threshold" is not respected in
> Hive 2.3.3, and basically it will always do a Fetch Task, irrelevant of the
> input size, as long as the conditions are met for either "more" or "minimal"
> setting of "hive.fetch.task.conversion".
> Apologies if this has been reported already, but I could not find any issues
> which mention this specifically.
> The way to reproduce is to set "hive.fetch.task.conversion.threshold=1",
> which to my understanding should basically always trigger an MR/Tez job, but
> it does not, and instead does a fetch task.
> Tested on various tables from dozens of GB in size to dozens of TBs in size
> with hundreds and thousands partitions, in ORC and Parquet format. Example
> table size from statistics:
> | Table Parameters: | NULL | NULL |
> | | EXTERNAL | TRUE |
> | | numFiles | 234258 |
> | | numPartitions | 171898 |
> | | numRows | 1719836838331 |
> | | rawDataSize | 515766839727247 |
> | | totalSize | 189367471403333 |
> Please let me know if any additional information is required.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)