Venki Korukanti created HIVE-12680:
--------------------------------------
Summary: Binary type partition column values are incorrectly
serialized and deserialized
Key: HIVE-12680
URL: https://issues.apache.org/jira/browse/HIVE-12680
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.2.1
Reporter: Venki Korukanti
Priority: Minor
Here are the repro steps:
{code}
CREATE TABLE kv_binary(key INT, value STRING) PARTITIONED BY (binary_part
BINARY);
INSERT INTO TABLE kv_binary PARTITION (binary_part='somevalue') SELECT * FROM
kv LIMIT 1;
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2015-12-15 13:34:15,758 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1142919541_0001
Loading data to table default.kv_binary partition (binary_part=[B@15871)
Partition default.kv_binary{binary_part=[B@15871} stats: [numFiles=1,
numRows=1, totalSize=13, rawDataSize=12]
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 8192 HDFS Write: 11733 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
{code}
Partition created has java object reference as value in FileSystem:
{code}
hadoop fs -ls /user/hive/warehouse/kv_binary
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-12-15 13:34
/user/hive/warehouse/kv_binary/binary_part=%5BB@15871
{code}
Selecting from the same table:
{code}
hive> SELECT * FROM kv_binary;
OK
238 val/238= [B@15871
{code}
This makes the binary partitions unusable, but binary partitions doesn't seem
to be commonly used. Logging the bug for tracking purposes. Seems like
somewhere are calling the toString on byte[].
BTW, this is working fine in Hive 1.0.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)