[ 
https://issues.apache.org/jira/browse/HIVE-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-6083:
---------------------------

    Description: 
I was trying to use a CTAS query to create a table stored with ORC and 
orc.compress was set to SNAPPY. However, the table was still compressed as ZLIB 
(although the result of DESCRIBE still shows that this table is compressed by 
SNAPPY). For a CTAS query, SemanticAnalyzer.genFileSinkPlan uses 
CreateTableDesc to generate the TableDesc for the FileSinkDesc by calling 
PlanUtils.getTableDesc. However, in PlanUtils.getTableDesc, I do not see user 
provided table properties are assigned to the returned TableDesc 
(CreateTableDesc.getTblProps was not called in this method ).  

btw, I only checked the code of 0.12 and trunk.

Two examples:
* Snappy compression
{code}
create table web_sales_wrong_orc_snappy
stored as orc tblproperties ("orc.compress"="SNAPPY")
as select * from web_sales;
{code}
{code}
describe formatted web_sales_wrong_orc_snappy;
....
Location:               
hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_snappy    
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   true                
        numFiles                1                   
        numRows                 719384              
        orc.compress            SNAPPY              
        rawDataSize             97815412            
        totalSize               40625243            
        transient_lastDdlTime   1387566015       
....   
{code}
{code}
bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_snappy/000000_0
Rows: 719384
Compression: ZLIB
Compression size: 262144
...
{code}
* No compression
{code}
create table web_sales_wrong_orc_none
stored as orc tblproperties ("orc.compress"="NONE")
as select * from web_sales;
{code}
{code}
describe formatted web_sales_wrong_orc_none;
....
Location:               
hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_none      
Table Type:             MANAGED_TABLE            
Table Parameters:                
        COLUMN_STATS_ACCURATE   true                
        numFiles                1                   
        numRows                 719384              
        orc.compress            NONE                
        rawDataSize             97815412            
        totalSize               40625243            
        transient_lastDdlTime   1387566064       
....   
{code}
{code}
bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_none/000000_0
Rows: 719384
Compression: ZLIB
Compression size: 262144
...
{code}

  was:
I was trying to use a CTAS query to create a table stored with ORC and 
orc.compress was set to SNAPPY. However, the table was still compressed as ZLIB 
(although the result of DESCRIBE still shows that this table is compressed by 
SNAPPY). For a CTAS query, SemanticAnalyzer.genFileSinkPlan uses 
CreateTableDesc to generate the TableDesc for the FileSinkDesc by calling 
PlanUtils.getTableDesc. However, in PlanUtils.getTableDesc, I do not see user 
provided table properties are assigned to the returned TableDesc 
(CreateTableDesc.getTblProps was not called in this method ).  

btw, I only checked the code of 0.12 and trunk.


> User provided table properties are not assigned to the TableDesc of the 
> FileSinkDesc in a CTAS query
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6083
>                 URL: https://issues.apache.org/jira/browse/HIVE-6083
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>         Attachments: HIVE-6083.1.patch.txt
>
>
> I was trying to use a CTAS query to create a table stored with ORC and 
> orc.compress was set to SNAPPY. However, the table was still compressed as 
> ZLIB (although the result of DESCRIBE still shows that this table is 
> compressed by SNAPPY). For a CTAS query, SemanticAnalyzer.genFileSinkPlan 
> uses CreateTableDesc to generate the TableDesc for the FileSinkDesc by 
> calling PlanUtils.getTableDesc. However, in PlanUtils.getTableDesc, I do not 
> see user provided table properties are assigned to the returned TableDesc 
> (CreateTableDesc.getTblProps was not called in this method ).  
> btw, I only checked the code of 0.12 and trunk.
> Two examples:
> * Snappy compression
> {code}
> create table web_sales_wrong_orc_snappy
> stored as orc tblproperties ("orc.compress"="SNAPPY")
> as select * from web_sales;
> {code}
> {code}
> describe formatted web_sales_wrong_orc_snappy;
> ....
> Location:             
> hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_snappy    
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       COLUMN_STATS_ACCURATE   true                
>       numFiles                1                   
>       numRows                 719384              
>       orc.compress            SNAPPY              
>       rawDataSize             97815412            
>       totalSize               40625243            
>       transient_lastDdlTime   1387566015       
> ....   
> {code}
> {code}
> bin/hive --orcfiledump 
> /user/hive/warehouse/web_sales_wrong_orc_snappy/000000_0
> Rows: 719384
> Compression: ZLIB
> Compression size: 262144
> ...
> {code}
> * No compression
> {code}
> create table web_sales_wrong_orc_none
> stored as orc tblproperties ("orc.compress"="NONE")
> as select * from web_sales;
> {code}
> {code}
> describe formatted web_sales_wrong_orc_none;
> ....
> Location:             
> hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_none      
> Table Type:           MANAGED_TABLE            
> Table Parameters:              
>       COLUMN_STATS_ACCURATE   true                
>       numFiles                1                   
>       numRows                 719384              
>       orc.compress            NONE                
>       rawDataSize             97815412            
>       totalSize               40625243            
>       transient_lastDdlTime   1387566064       
> ....   
> {code}
> {code}
> bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_none/000000_0
> Rows: 719384
> Compression: ZLIB
> Compression size: 262144
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to