[
https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056040#comment-15056040
]
Adam Roberts commented on SPARK-12319:
--------------------------------------
Hi Sean, here are the failures
ExchangeCoordinatorSuite:
- test estimatePartitionStartIndices - 1 Exchange
- test estimatePartitionStartIndices - 2 Exchanges
- test estimatePartitionStartIndices and enforce minimal number of reducers
- determining the number of reducers: aggregate
operator(minNumPostShufflePartitions: 3)
- determining the number of reducers: join
operator(minNumPostShufflePartitions: 3)
- determining the number of reducers: complex query
1(minNumPostShufflePartitions: 3)
- determining the number of reducers: complex query
2(minNumPostShufflePartitions: 3)
- determining the number of reducers: aggregate operator *** FAILED ***
3 did not equal 2 (ExchangeCoordinatorSuite.scala:315)
- determining the number of reducers: join operator *** FAILED ***
1 did not equal 2 (ExchangeCoordinatorSuite.scala:366)
- determining the number of reducers: complex query 1
- determining the number of reducers: complex query 2 *** FAILED ***
Set(2) did not equal Set(2, 3) (ExchangeCoordinatorSuite.scala:472)
The fix is to replace the use of DataInput/OutputStreams with
LittleEndianDataInput/OutputStream objects in order to have these tests pass on
big endian platforms
With regards to the Dataset failure (using DF behind the scenes and also using
the tungsten optimised agg function), here's a snippet of the failing test
output
== Physical Plan ==
TungstenAggregate(key=[value#1148],
functions=[(ClassInputAgg$(b#1050,a#1051),mode=Final,isDistinct=false)],
output=[value#1148,ClassInputAgg$(b,a)#1162])
TungstenExchange (HashPartitioning 5), None
TungstenAggregate(key=[value#1148],
functions=[(ClassInputAgg$(b#1050,a#1051),mode=Partial,isDistinct=false)],
output=[value#1148,value#1158])
!AppendColumns <function1>, class[a[0]: int, b[0]: string],
class[value[0]: string], [value#1148]
Project [one AS b#1050,1 AS a#1051]
Scan OneRowRelation[]
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
![one,1] [one,9] (QueryTest.scala:127)
This is for the third checkAnswer call in the reordering test:
checkAnswer(
ds.groupBy(_.b).agg(ClassInputAgg.toColumn),
("one", 1))
If we change our sql statement from
val ds = sql("SELECT 'one' AS b, 1 as a").as[AggData]
so that a is, say, 2, we get 10. With 3, we get 11, etc.
> Address endian specific problems surfaced in 1.6
> ------------------------------------------------
>
> Key: SPARK-12319
> URL: https://issues.apache.org/jira/browse/SPARK-12319
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Environment: BE platforms
> Reporter: Adam Roberts
> Priority: Critical
>
> JIRA to cover endian specific problems - since testing 1.6 I've noticed
> problems with DataFrames on BE platforms, e.g.
> https://issues.apache.org/jira/browse/SPARK-9858
> [~joshrosen] [~yhuai]
> Current progress: using com.google.common.io.LittleEndianDataInputStream and
> com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer
> fixes three test failures in ExchangeCoordinatorSuite but I'm concerned
> around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input
> with reordering" fails as we expect "one, 1" but instead get "one, 9" - we
> believe the issue lies within BitSetMethods.java, specifically around: return
> (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word);
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]