[jira] [Created] (HIVE-12680) Binary type partition column values are incorrectly serialized and deserialized

Venki Korukanti (JIRA) Tue, 15 Dec 2015 13:49:17 -0800

Venki Korukanti created HIVE-12680:
--------------------------------------

             Summary: Binary type partition column values are incorrectly 
serialized and deserialized
                 Key: HIVE-12680
                 URL: https://issues.apache.org/jira/browse/HIVE-12680
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: Venki Korukanti
            Priority: Minor



Here are the repro steps:

{code}
CREATE TABLE kv_binary(key INT, value STRING) PARTITIONED BY (binary_part 
BINARY);
INSERT INTO TABLE kv_binary PARTITION (binary_part='somevalue') SELECT * FROM 
kv LIMIT 1;
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-15 13:34:15,758 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1142919541_0001
Loading data to table default.kv_binary partition (binary_part=[B@15871)
Partition default.kv_binary{binary_part=[B@15871} stats: [numFiles=1, 
numRows=1, totalSize=13, rawDataSize=12]
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 8192 HDFS Write: 11733 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
{code}

Partition created has java object reference as value in FileSystem:
{code}
hadoop fs -ls /user/hive/warehouse/kv_binary
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2015-12-15 13:34 
/user/hive/warehouse/kv_binary/binary_part=%5BB@15871
{code}

Selecting from the same table:
{code}
hive> SELECT * FROM kv_binary;
OK
238     val/238=        [B@15871
{code}

This makes the binary partitions unusable, but binary partitions doesn't seem 
to be commonly used. Logging the bug for tracking purposes. Seems like 
somewhere are calling the toString on byte[].

BTW, this is working fine in Hive 1.0.0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12680) Binary type partition column values are incorrectly serialized and deserialized

Reply via email to