[
https://issues.apache.org/jira/browse/HIVE-22495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Xu updated HIVE-22495:
----------------------------
Description:
Running a hive query on a Parquet table
select count ( * ) from test_table
The query read in all data (all columns) instead of just metadata.
For comparison, hive 0.13 and Spark read in much less data with my test table.
||engine||HDFS data read||
|Hive 2.3.4| 452.9 MB|
|Hive 0.13| 22.5 KB|
|Spark| 41.6 KB|
Seems cause is that Parquet read support fall back to file schema if
indexColumnsWanted is empty, logic still exist in master branch.
Don't know why this empty list check was added, please suggest if there're any
other impact.
was:
Running a hive query on a Parquet table
select count ( * ) from t
The query read in all data (all columns) instead of just metadata.
For comparison, hive 0.13 and Spark read in much less data.
||engine||HDFS data read||
|Hive 2.3.4| 452.9 MB|
|Hive 0.13| 22.5 KB|
|Spark| 41.6 KB|
Seems cause is that Parquet read support fall back to file schema if
indexColumnsWanted is empty, logic still exist in master branch.
Don't know why this empty list check was added, please suggest if there're any
other impact.
> Parquet count(*) read in all data
> ---------------------------------
>
> Key: HIVE-22495
> URL: https://issues.apache.org/jira/browse/HIVE-22495
> Project: Hive
> Issue Type: Bug
> Components: Reader
> Reporter: Jason Xu
> Assignee: Jason Xu
> Priority: Major
> Attachments: HIVE-22495.patch
>
>
> Running a hive query on a Parquet table
> select count ( * ) from test_table
> The query read in all data (all columns) instead of just metadata.
> For comparison, hive 0.13 and Spark read in much less data with my test table.
>
> ||engine||HDFS data read||
> |Hive 2.3.4| 452.9 MB|
> |Hive 0.13| 22.5 KB|
> |Spark| 41.6 KB|
>
> Seems cause is that Parquet read support fall back to file schema if
> indexColumnsWanted is empty, logic still exist in master branch.
> Don't know why this empty list check was added, please suggest if there're
> any other impact.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)