[ https://issues.apache.org/jira/browse/HIVE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744377#action_12744377 ]
Zheng Shao commented on HIVE-759: --------------------------------- If user specify lzo and lzo cannot be loaded, we should output an error instead of changing it to non-compression. That will silently hide the problem from the user. We know lzo is better, but nowhere in the hadoop code do we set the default to lzo right? What about making the default the same as "mapred.output.compression.*"? That might be a better default since it does not change the current behavior if the user does not know about this update. > add hive.intermediate.compression.codec option > ---------------------------------------------- > > Key: HIVE-759 > URL: https://issues.apache.org/jira/browse/HIVE-759 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Zheng Shao > Assignee: He Yongqiang > Attachments: hive-759-2009-08-17.patch, hive-759-2009-08-18.patch > > > Hive uses the jobconf compression codec for all map-reduce jobs. This > includes both mapred.map.output.compression.codec and > mapred.output.compression.codec. > In some cases, we want to distinguish between the codec used for intermediate > map-reduce jobs (that produces intermediate data between jobs) and the final > map-reduce jobs (that produces data stored in tables). > For intermediate data, lzo might be a better fit because it's much faster; > for final data, gzip might be a better fit because it saves disk spaces. > We should introduce two new options: > {code} > hive.intermediate.compression.codec=org.apache.hadoop.io.compress.LzoCodec > hive.intermediate.compression.type=BLOCK > {code} > And use these 2 options to override the mapred.output.compression.* in the > FileSinkOperator that produces intermediate data. > Note that it's possible that a single map-reduce job may have 2 > FileSInkOperators: one produces intermediate data, and one produces final > data. So we need to add a flag to fileSinkDesc for that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.