[
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Naveen Gangam updated HIVE-16257:
---------------------------------
Attachment: success_yarnlogs.log
failed_yarnlogs.log
Managed to reproduced this in-house (just on one particular cluster though).
I am attaching the yarn logs from one particular run when I could reproduce the
issue the first time(@17/03/29 09:21:36) I ran the query, but the next run
(@17/03/29 09:23:23) immediately after the first worked fine.
Query plan looks identical in both cases and I do not spot anything that looks
suspicious. [~xuefuz] Could you take a quick look to see if you can spot
something? Any help much appreciated. Thanks a bunch.
> Intermittent issue with incorrect resultset with Spark
> ------------------------------------------------------
>
> Key: HIVE-16257
> URL: https://issues.apache.org/jira/browse/HIVE-16257
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.1.0
> Reporter: Naveen Gangam
> Attachments: failed_yarnlogs.log, success_yarnlogs.log
>
>
> This issue is highly intermittent that only seems to occurs with spark engine
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2
> decimal(20,3));
> insert into test_hos_sample values
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> name val1 val2
> test5 105.52 105.567
> test3 103.52 102.345
> test1 101.12 102.123
> test4 104.52 104.456
> test2 102.12 103.234
> {code}
> Incorrect results once in a while:
> {code}
> name val1 val2
> test5 105.52 105.567
> test3 103.52 102.345
> test1 104.52 102.123
> test4 104.52 104.456
> test2 102.12 103.234
> {code}
> 1) Not reproducible with HoMR.
> 2) Not an issue when running from spark-shell.
> 3) Not reproducible when the column data type is String or double. Only
> reproducible with decimal data types. Also works fine for decimal datatype if
> you cast decimal as string on read and cast it back to decimal on select.
> 4) Occurs with parquet and text file format as well. (havent tried with other
> formats).
> 5) Occurs in both scenarios when table data is within encryption zone and
> outside.
> 6) Even in clusters where this is reproducible, this occurs once in like 20
> times or more.
> 7) Occurs with both Beeline and Hive CLI.
> 8) Reproducible only when there is a a groupby clause.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)