[
https://issues.apache.org/jira/browse/HIVE-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ping Lu updated HIVE-13265:
---------------------------
Attachment: explain2.txt
execution2.txt
execution1.txt
explain1.txt
> Query consists of union all and mapjoin, throw Exception “Unable to
> deserialize reduce input key”
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-13265
> URL: https://issues.apache.org/jira/browse/HIVE-13265
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.1
> Environment: Hadoop2.4.0 Hive0.13.1
> Reporter: Ping Lu
> Attachments: execution1.txt, execution2.txt, explain1.txt,
> explain2.txt
>
>
> Steps to reproduce
> Prepare:
> create four test tables and load data
> create table tmp_test1(col1 string);
> create table tmp_test2(col1 string);
> create table tmp_test3(col1 string,col2 string) row format delimited
> fields terminated by "\t";
> create table tmp_test4(col1 string);
> load data local inpath "test3" into table tmp_test1; // 6 rows
> load data local inpath "test3" into table tmp_test2; // 5 rows
> load data local inpath "test3" into table tmp_test3; // 6 rows
> load data local inpath "test4" into table tmp_test4; // 3000011 rows,
> 26670421Byte(>25M)
> Query1: error encountered while executing
> set hive.auto.convert.join=true;
> select
> sq.col1,
> count(distinct sq.col2) num
> from(
> select
> col1,
> null col2
> from
> tmp_test1
> union all
> select
> col1,
> null col2
> from
> tmp_test2
> union all
> select
> col1,
> col2
> from
> tmp_test3
> )sq --sq'size is far smaller than 25M
> join
> tmp_test4 ta
> ON sq.col1 = ta.col1
> group by sq.col1;
> when set hive.auto.convert.join to true, join was converted to MapJoin
> and sq was chosen as the small table.
> Query2: SELECT query got correct result
> set hive.auto.convert.join=false;
> select
> sq.col1,
> count(distinct sq.col2) num
> from(
> select
> col1,
> null col2
> from
> tmp_test1
> union all
> select
> col1,
> null col2
> from
> tmp_test2
> union all
> select
> col1,
> col2
> from
> tmp_test3
> )sq
> join
> tmp_test4 ta
> ON sq.col1 = ta.col1
> group by sq.col1;
> the execute plan for Query1 names explain1.txt .
> the hive execution logs for Query1: SELECT statement names execution1.txt .
> the execute plan for the Query2 names explain2.txt .
> the hive execution logs for Query2 names execution2.txt .
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)