[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

Rui Li (JIRA) Tue, 09 Sep 2014 02:16:50 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rui Li updated HIVE-8017:
-------------------------
    Attachment: HIVE-8017.2-spark.patch

This patch fixes some failed qfile tests caused by last patch.
Two qtests are not fixed: {{optimize_nullscan.q}} and {{union_remove_25.q}}.
For {{optimize_nullscan.q}}  I checked the corresponding MR output and found 
the operator tree in the new output file is more similar to the one in the MR 
version output. Besides this failure is of age 2, so I guess it's not related 
to the patch here.
For {{union_remove_25.q}}, the only diff is the total size of {{outputTbl2}} 
(6812 -> 6826). I checked the MR version and the total size is also 6812. I'm 
not sure what causes this difference. Maybe need to do more tests for 
partitioned table.
[~xuefuz] do you have any idea on this?

> Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
> Branch]
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-8017
>                 URL: https://issues.apache.org/jira/browse/HIVE-8017
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch
>
>
> HiveKey should be used as the key type because it holds the hash code for 
> partitioning. While BytesWritable serves partitioning well for simple cases, 
> we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
> bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

Reply via email to