[ 
https://issues.apache.org/jira/browse/HADOOP-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated HADOOP-4169:
--------------------------------------

    Description: 
Hive produces two types of data files - flat files and sequencefiles. Syntax 
should reflect this. Currently the 'compressed' keyword is used to choose 
sequencefile format - but does not actually compress the files. this is 
misleading. In addition - flat files can also be compressed.

Proposal is to replace 'compressed' with 'sequencefile'. And compression 
options should be applied from standard hadoop way of specifying whether output 
should be compressed (''mapred.output.compress') - ie. session options. 
(session options will also define codec etc.). default file format and 
compression options can be specified in conf file.

  was:
Hive two types of data files - flat files and sequencefiles. Syntax should 
reflect this. Currently the 'compressed' keyword is used to choose sequencefile 
format - but does not actually compress the files. this is misleading. In 
addition - flat files can also be compressed.

Proposal is to replace 'compressed' with 'sequencefile'. And compression 
options should be applied from standard hadoop way of specifying whether output 
should be compressed (''mapred.output.compress') - ie. session options. 
(session options will also define codec etc.). default file format and 
compression options can be specified in conf file.


> 'compressed' keyword in DDL syntax misleading and does not compress
> -------------------------------------------------------------------
>
>                 Key: HADOOP-4169
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4169
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> Hive produces two types of data files - flat files and sequencefiles. Syntax 
> should reflect this. Currently the 'compressed' keyword is used to choose 
> sequencefile format - but does not actually compress the files. this is 
> misleading. In addition - flat files can also be compressed.
> Proposal is to replace 'compressed' with 'sequencefile'. And compression 
> options should be applied from standard hadoop way of specifying whether 
> output should be compressed (''mapred.output.compress') - ie. session 
> options. (session options will also define codec etc.). default file format 
> and compression options can be specified in conf file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to