[ https://issues.apache.org/jira/browse/HIVE-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124302#comment-14124302 ]
Xuefu Zhang commented on HIVE-7956: ----------------------------------- No. We can use the generic serialize/deserialize method in KryoSerializer. Specifically, the following methods can be used: {code} public static byte[] serialize(Object object) ; public static <T> T deserialize(byte[] buffer,Class<T> clazz); {code} This should take care of the whole object, including the filed hashcode. > When inserting into a bucketed table, all data goes to a single bucket [Spark > Branch] > ------------------------------------------------------------------------------------- > > Key: HIVE-7956 > URL: https://issues.apache.org/jira/browse/HIVE-7956 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > > I created a bucketed table: > {code} > create table testBucket(x int,y string) clustered by(x) into 10 buckets; > {code} > Then I run a query like: > {code} > set hive.enforce.bucketing = true; > insert overwrite table testBucket select intCol,stringCol from src; > {code} > Here {{src}} is a simple textfile-based table containing 40000000 records > (not bucketed). The query launches 10 reduce tasks but all the data goes to > only one of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)