[
https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889579#action_12889579
]
Joydeep Sen Sarma commented on HIVE-1468:
-----------------------------------------
yes - it does make sense to differentiate result data from intermediate. if
anything - there's probably a good argument to be made that we don't need a
separate option for intermediate compression. it should default to whatever
policy is being applied for map-reduce intermediate traffic. (that would be a
better default than either true or false - that way admins have one less option
to get right).
interestingly - result data also needs minimal replication. the client is
single threaded and cannot exploit multiple replicas for bandwidth purposes.
also - the data is temporary in nature and doesn't need reliability.
> intermediate data produced for select queries ignores
> hive.exec.compress.intermediate
> -------------------------------------------------------------------------------------
>
> Key: HIVE-1468
> URL: https://issues.apache.org/jira/browse/HIVE-1468
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Joydeep Sen Sarma
>
> > set hive.exec.compress.intermediate=false;
> > explain extended select xxx from yyy;
> ...
> File Output Operator
> compressed: true
> GlobalTableId: 0
> looks like we only intermediate locations identified during splitting mr
> tasks follow this directive. this should be fixed because this forces clients
> to always decompress output data (even if the config setting is altered).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.