[
https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin resolved SPARK-10914.
---------------------------------
Resolution: Fixed
Assignee: Reynold Xin
Fix Version/s: 1.6.0
1.5.2
> UnsafeRow serialization breaks when two machines have different Oops size
> -------------------------------------------------------------------------
>
> Key: SPARK-10914
> URL: https://issues.apache.org/jira/browse/SPARK-10914
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.0, 1.5.1
> Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)
> Reporter: Ben Moran
> Assignee: Reynold Xin
> Fix For: 1.5.2, 1.6.0
>
>
> *Updated description (by rxin on Oct 8, 2015)*
> To reproduce, launch Spark using
> {code}
> MASTER=local-cluster[2,1,1024] bin/spark-shell --conf
> "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
> {code}
> And then run the following
> {code}
> scala> sql("select 1 xx").collect()
> {code}
> The problem is that UnsafeRow contains 3 pieces of information when pointing
> to some data in memory (an object, a base offset, and length). When the row
> is serialized with Java/Kryo serialization, the object layout in memory can
> change if two machines have different pointer width (Oops in JVM).
> *Original bug report description*:
> Using an inner join, to match together two integer columns, I generally get
> no results when there should be matches. But the results vary and depend on
> whether the dataframes are coming from SQL, JSON, or cached, as well as the
> order in which I cache things and query them.
> This minimal example reproduces it consistently for me in the spark-shell, on
> new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from
> http://spark.apache.org/downloads.html.)
> {code}
> /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
> val x = sql("select 1 xx union all select 2")
> val y = sql("select 1 yy union all select 2")
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
> /* If I cache both tables it works: */
> x.cache()
> y.cache()
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
> /* but this still doesn't work: */
> x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]