[ 
https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056040#comment-15056040
 ] 

Adam Roberts commented on SPARK-12319:
--------------------------------------

Hi Sean, here are the failures

ExchangeCoordinatorSuite:
- test estimatePartitionStartIndices - 1 Exchange
- test estimatePartitionStartIndices - 2 Exchanges
- test estimatePartitionStartIndices and enforce minimal number of reducers
- determining the number of reducers: aggregate 
operator(minNumPostShufflePartitions: 3)
- determining the number of reducers: join 
operator(minNumPostShufflePartitions: 3)
- determining the number of reducers: complex query 
1(minNumPostShufflePartitions: 3)

- determining the number of reducers: complex query 
2(minNumPostShufflePartitions: 3)
- determining the number of reducers: aggregate operator *** FAILED ***
  3 did not equal 2 (ExchangeCoordinatorSuite.scala:315)
- determining the number of reducers: join operator *** FAILED ***
  1 did not equal 2 (ExchangeCoordinatorSuite.scala:366)
- determining the number of reducers: complex query 1
- determining the number of reducers: complex query 2 *** FAILED ***
  Set(2) did not equal Set(2, 3) (ExchangeCoordinatorSuite.scala:472)

The fix is to replace the use of DataInput/OutputStreams with 
LittleEndianDataInput/OutputStream objects in order to have these tests pass on 
big endian platforms

With regards to the Dataset failure (using DF behind the scenes and also using 
the tungsten optimised agg function), here's a snippet of the failing test 
output

  == Physical Plan ==
  TungstenAggregate(key=[value#1148], 
functions=[(ClassInputAgg$(b#1050,a#1051),mode=Final,isDistinct=false)], 
output=[value#1148,ClassInputAgg$(b,a)#1162])
   TungstenExchange (HashPartitioning 5), None
    TungstenAggregate(key=[value#1148], 
functions=[(ClassInputAgg$(b#1050,a#1051),mode=Partial,isDistinct=false)], 
output=[value#1148,value#1158])
     !AppendColumns <function1>, class[a[0]: int, b[0]: string], 
class[value[0]: string], [value#1148]
      Project [one AS b#1050,1 AS a#1051]
       Scan OneRowRelation[]
  == Results ==
  !== Correct Answer - 1 ==   == Spark Answer - 1 ==
  ![one,1]                    [one,9] (QueryTest.scala:127)

This is for the third checkAnswer call in the reordering test:

checkAnswer(
      ds.groupBy(_.b).agg(ClassInputAgg.toColumn),
      ("one", 1))

If we change our sql statement from 

val ds = sql("SELECT 'one' AS b, 1 as a").as[AggData]

so that a is, say, 2, we get 10. With 3, we get 11, etc.

> Address endian specific problems surfaced in 1.6
> ------------------------------------------------
>
>                 Key: SPARK-12319
>                 URL: https://issues.apache.org/jira/browse/SPARK-12319
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: BE platforms
>            Reporter: Adam Roberts
>            Priority: Critical
>
> JIRA to cover endian specific problems - since testing 1.6 I've noticed 
> problems with DataFrames on BE platforms, e.g. 
> https://issues.apache.org/jira/browse/SPARK-9858
> [~joshrosen] [~yhuai]
> Current progress: using com.google.common.io.LittleEndianDataInputStream and 
> com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer 
> fixes three test failures in ExchangeCoordinatorSuite but I'm concerned 
> around performance/wider functional implications
> "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input 
> with reordering" fails as we expect "one, 1" but instead get "one, 9" - we 
> believe the issue lies within BitSetMethods.java, specifically around: return 
> (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to