Aihua Xu created HIVE-20079:
-------------------------------
Summary: Populate more accurate rawDataSize for parquet format
Key: HIVE-20079
URL: https://issues.apache.org/jira/browse/HIVE-20079
Project: Hive
Issue Type: Improvement
Components: File Formats
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
Run the following queries and you will see the raw data for the table is 4
(that is the number of fields) incorrectly. We need to populate correct data
size so data can be split properly.
{noformat}
SET hive.stats.autogather=true;
CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET;
INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1');
DESC FORMATTED parquet_stats;
{noformat}
{noformat}
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 2
rawDataSize 4
totalSize 373
transient_lastDdlTime 1530660523
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)