[jira] [Updated] (HIVE-16257) Intermittent issue with incorrect resultset with Spark

Naveen Gangam (JIRA) Wed, 05 Apr 2017 13:49:57 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-16257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Naveen Gangam updated HIVE-16257:
---------------------------------
    Attachment: success_yarnlogs.log
                failed_yarnlogs.log

Managed to reproduced this in-house (just on one particular cluster though). 
I am attaching the yarn logs from one particular run when I could reproduce the 
issue the first time(@17/03/29 09:21:36) I ran the query, but the next run 
(@17/03/29 09:23:23) immediately after the first worked fine.

Query plan looks identical in both cases and I do not spot anything that looks 
suspicious. [~xuefuz] Could you take a quick look to see if you can spot 
something? Any help much appreciated. Thanks a bunch.

> Intermittent issue with incorrect resultset with Spark
> ------------------------------------------------------
>
>                 Key: HIVE-16257
>                 URL: https://issues.apache.org/jira/browse/HIVE-16257
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.1.0
>            Reporter: Naveen Gangam
>         Attachments: failed_yarnlogs.log, success_yarnlogs.log
>
>
> This issue is highly intermittent that only seems to occurs with spark engine 
> when the query has a GROUPBY clause. The following is the testcase.
> {code}
> drop table if exists test_hos_sample;
> create table test_hos_sample (name string, val1 decimal(18,2), val2 
> decimal(20,3));
> insert into test_hos_sample values 
> ('test1',101.12,102.123),('test1',101.12,102.123),('test2',102.12,103.234),('test1',101.12,102.123),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test3',103.52,102.345),('test4',104.52,104.456),('test4',104.52,104.456),('test5',105.52,105.567),('test3',103.52,102.345),('test5',105.52,105.567);
> set hive.execution.engine=spark;
> select  name, val1,val2 from test_hos_sample group by name, val1, val2;
> {code}
> Expected Results:
> {code}
> name    val1    val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   101.12  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> Incorrect results once in a while:
> {code}
> name    val1    val2
> test5   105.52  105.567
> test3   103.52  102.345
> test1   104.52  102.123
> test4   104.52  104.456
> test2   102.12  103.234
> {code}
> 1) Not reproducible with HoMR.
> 2) Not an issue when running from spark-shell.
> 3) Not reproducible when the column data type is String or double. Only 
> reproducible with decimal data types. Also works fine for decimal datatype if 
> you cast decimal as string on read and cast it back to decimal on select.
> 4) Occurs with parquet and text file format as well. (havent tried with other 
> formats).
> 5) Occurs in both scenarios when table data is within encryption zone and 
> outside.
> 6) Even in clusters where this is reproducible, this occurs once in like 20 
> times or more.
> 7) Occurs with both Beeline and Hive CLI.
> 8) Reproducible only when there is a a groupby clause.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16257) Intermittent issue with incorrect resultset with Spark

Reply via email to