[
https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073795#comment-15073795
]
Tim Preece commented on SPARK-12555:
------------------------------------
Analysis shows that this test fails because of data corruption:
There is a mismatch between unsaferow (string,int) and the schema (int,string),
presumably because the test involves reordering of columns.
Subsequently when joining (string,int) + (string) the code incorrectly patches
the int value with the offset change of the first String.
This data corruption occurs on ALL platforms and the offset part of the first
string is always incorrect. On Big Endian platforms the value for the integer
is also corrupted. This is simply due to location of the 4-byte integer in the
8-byte unsafe row slot.
> Build Failure on 1.6
> --------------------
>
> Key: SPARK-12555
> URL: https://issues.apache.org/jira/browse/SPARK-12555
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Environment: ALL platforms ( although test only explicitly fails on
> Big Endian platforms ).
> Reporter: Tim Preece
> Priority: Blocker
>
> org.apache.spark.sql.DatasetAggregatorSuite
> - typed aggregation: class input with reordering *** FAILED ***
> Results do not match for query:
> == Parsed Logical Plan ==
> Aggregate [value#748],
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS
> ClassInputAgg$(b,a)#762]
> +- AppendColumns <function1>, class[a[0]: int, b[0]: string],
> class[value[0]: string], [value#748]
> +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>
> == Analyzed Logical Plan ==
> value: string, ClassInputAgg$(b,a): int
> Aggregate [value#748],
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS
> ClassInputAgg$(b,a)#762]
> +- AppendColumns <function1>, class[a[0]: int, b[0]: string],
> class[value[0]: string], [value#748]
> +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>
> == Optimized Logical Plan ==
> Aggregate [value#748],
> [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS
> ClassInputAgg$(b,a)#762]
> +- AppendColumns <function1>, class[a[0]: int, b[0]: string],
> class[value[0]: string], [value#748]
> +- Project [one AS b#650,1 AS a#651]
> +- OneRowRelation$
>
> == Physical Plan ==
> TungstenAggregate(key=[value#748],
> functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)],
> output=[value#748,ClassInputAgg$(b,a)#762])
> +- TungstenExchange hashpartitioning(value#748,5), None
> +- TungstenAggregate(key=[value#748],
> functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)],
> output=[value#748,value#758])
> +- !AppendColumns <function1>, class[a[0]: int, b[0]: string],
> class[value[0]: string], [value#748]
> +- Project [one AS b#650,1 AS a#651]
> +- Scan OneRowRelation[]
> == Results ==
> !== Correct Answer - 1 == == Spark Answer - 1 ==
> ![one,1] [one,9] (QueryTest.scala:127)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]