[
https://issues.apache.org/jira/browse/SPARK-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049284#comment-15049284
]
Adam Roberts commented on SPARK-9858:
-------------------------------------
Yep, I added System.identityHashCode(serializer) prints in both the creation
method and when it's used (both in the Exchange class)
Creating new unsafe row serializer
ADAMTEST. myUnsafeRowSerializer identity hash: -555078685
Creating new unsafe row serializer
ADAMTEST. myUnsafeRowSerializer identity hash: 1088823803
preparing shuffle dependency
ADAMTEST. In needToCopy function and serializer hash is: 1088823803
New development, on Intel (LE platform) if we take the 200 elements and print
them, we get 20 rows containing (3,[0,13,5,ff00000000000000]) in a row. On our
BE platforms this isn't the case, everything is
(3,[0,13,5,0]) - the same as the rest of the file on Intel. This print is in
DAGScheduler's submitMapStage method:
val rdd = dependency.rdd
rdd.take(200).foreach(println)
> Introduce an ExchangeCoordinator to estimate the number of post-shuffle
> partitions.
> -----------------------------------------------------------------------------------
>
> Key: SPARK-9858
> URL: https://issues.apache.org/jira/browse/SPARK-9858
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Yin Huai
> Assignee: Yin Huai
> Fix For: 1.6.0
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]