[
https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421905#comment-17421905
]
katty he commented on HIVE-25557:
---------------------------------
count(*) on MR wil faster than Tez, normally, count operation can only read
parquet metadata, but in this case it read all the data and compute, do i am
confused and there is plan:
!image-2021-09-29-11-07-04-118.png!
> Hive 3.1.2 with Tez is slow to clount data in parquet format
> ------------------------------------------------------------
>
> Key: HIVE-25557
> URL: https://issues.apache.org/jira/browse/HIVE-25557
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 3.1.2
> Environment: Tez *0.10.1*
> Reporter: katty he
> Priority: Major
> Attachments: image-2021-09-29-11-07-04-118.png
>
>
> recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with
> Tez, and the table is in parquet format, normally, when counting, the query
> engin can read metadata instead of reading the full data, but in my case,
> Tez can not get count by metadata only, it will read the data, so it's slow,
> when count 2 billion data, tez wil use 500s , and spend 60s to initialized,
> ts that a problem?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)