Aihua Xu created HIVE-20079:
-------------------------------

             Summary: Populate more accurate rawDataSize for parquet format
                 Key: HIVE-20079
                 URL: https://issues.apache.org/jira/browse/HIVE-20079
             Project: Hive
          Issue Type: Improvement
          Components: File Formats
    Affects Versions: 2.0.0
            Reporter: Aihua Xu
            Assignee: Aihua Xu


Run the following queries and you will see the raw data for the table is 4 
(that is the number of fields) incorrectly. We need to populate correct data 
size so data can be split properly.
{noformat}
SET hive.stats.autogather=true;
CREATE TABLE parquet_stats (id int,str string) STORED AS PARQUET;
INSERT INTO parquet_stats values(0, 'this is string 0'), (1, 'string 1');
DESC FORMATTED parquet_stats;
{noformat}

{noformat}
Table Parameters:
        COLUMN_STATS_ACCURATE   true
        numFiles                1
        numRows                 2
        rawDataSize             4
        totalSize               373
        transient_lastDdlTime   1530660523
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to