[
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933437#comment-15933437
]
Naveen Gangam commented on HIVE-16257:
--------------------------------------
Thanks [~appodictic] I try to add as many details as possible. It helps others
understand the issue as well. Heck, even myself. A couple of months from now, I
would be wondering what this issue was. :)
> Intermittent issue with incorrect resultset with Spark
> ------------------------------------------------------
>
> Key: HIVE-16257
> URL: https://issues.apache.org/jira/browse/HIVE-16257
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.0
> Reporter: Naveen Gangam
>
> This issue is highly intermittent that only seems to occurs with spark engine
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2
> decimal(20,3));
> insert into test_hos_sample values
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> name val1 val2
> test5 105.52 105.567
> test3 103.52 102.345
> test1 101.12 102.123
> test4 104.52 104.456
> test2 102.12 103.234
> {code}
> Incorrect results once in a while:
> {code}
> name val1 val2
> test5 105.52 105.567
> test3 103.52 102.345
> test1 104.52 102.123
> test4 104.52 104.456
> test2 102.12 103.234
> {code}
> 1) Not reproducible with HoMR.
> 2) Not an issue when running from spark-shell.
> 3) Not reproducible when the column data type is String or double. Only
> reproducible with decimal data types. Also works fine for decimal datatype if
> you cast decimal as string on read and cast it back to decimal on select.
> 4) Occurs with parquet and text file format as well. (havent tried with other
> formats).
> 5) Occurs in both scenarios when table data is within encryption zone and
> outside.
> 6) Even in clusters where this is reproducible, this occurs once in like 20
> times or more.
> 7) Occurs with both Beeline and Hive CLI.
> 8) Reproducible only when there is a a groupby clause.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)