Jerry Chen created HIVE-3934:
--------------------------------

             Summary: Put tag in value for join with map reduce
                 Key: HIVE-3934
                 URL: https://issues.apache.org/jira/browse/HIVE-3934
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor, Serializers/Deserializers
    Affects Versions: 0.11.0
            Reporter: Jerry Chen


While trying to facilitate hash-based map reduce, I found that for join with 
map reduce in hive, the tag is appended to the key writable. This is quite a 
hinder for facilitating other runtime map reduce implementation of map reduce 
computation model such as hash-based map reduce. For example, whent the tag was 
in the key, there are some special things must be cared, such as,

1. HiveKey must handles specially for the hash code for properly partition the 
keys between the reduce.
2. The key in map reduce's view is actually key + tag and which makes map 
reduce sort a compulsory to satisfy the need of hive to group the key in reduce 
side. This disables or hinders hash-based map reduce because group by key + tag 
will make no sense to hive. 
3. ExecReducer must check the real key boundary by stripping out the tag for 
startGroup and endGroup calls to the operator. While without the tag, each 
reduce call is a natural key boundary.

Considering append the tag as the last byte to the value writable which can 
avoid all the above things and fit naturually to map reduce computation model.

I see the code in JoinOperator which will generate join results ealier which 
assumes on the fact that the tag is sorted. This only useful when there are 
many many rows with the same key in both join tables which is not a compulsory 
for most cases.

Let's disucss the possibiblity of "tag in value" approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to