[ 
https://issues.apache.org/jira/browse/SPARK-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-2610:
----------------------------

    Description: 
To reproduce, set
{code}
spark.serializer        org.apache.spark.serializer.KryoSerializer
{code}
in conf/spark-defaults.conf and launch a spark shell.
Then, execute
{code}
class X() { println("What!"); def y = 3 }
val x = new X
import x.y

case class Person(name: String, age: Int)

val serializer = org.apache.spark.serializer.Serializer.getSerializer(null)
val kryoSerializer = serializer.newInstance

val value = kryoSerializer.serialize(Person("abc", 1))
kryoSerializer.deserialize(value): Person
// Once you execute this line, you will see ...
// What!
// What!
// res1: Person = Person(abc,1)
{code}

Basically, importing a method of a class causes the constructor of that class 
been called twice.

It affects our branch 1.0 and master.
For the master, you can use 
{code}
val serializer = org.apache.spark.serializer.Serializer.getSerializer(None)
{code}
to get the serializer.

  was:
To reproduce, set
{code}
spark.serializer        org.apache.spark.serializer.KryoSerializer
{code}
in conf/spark-defaults.conf and launch a spark shell.
Then, execute
{code}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.createSchemaRDD
case class Person(name: String, age: Int)
val people = 
sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p 
=> Person(p(0), p(1).trim.toInt))
people.collect
{code}

There is no extra spark application creations if you remove
{code}
import sqlContext.createSchemaRDD
{code}

Our current branch 1.0 also has this issue. 


> When spark.serializer is set as org.apache.spark.serializer.KryoSerializer, 
> importing a method causes multiple spark applications creations  
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-2610
>                 URL: https://issues.apache.org/jira/browse/SPARK-2610
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.0.1
>            Reporter: Yin Huai
>
> To reproduce, set
> {code}
> spark.serializer        org.apache.spark.serializer.KryoSerializer
> {code}
> in conf/spark-defaults.conf and launch a spark shell.
> Then, execute
> {code}
> class X() { println("What!"); def y = 3 }
> val x = new X
> import x.y
> case class Person(name: String, age: Int)
> val serializer = org.apache.spark.serializer.Serializer.getSerializer(null)
> val kryoSerializer = serializer.newInstance
> val value = kryoSerializer.serialize(Person("abc", 1))
> kryoSerializer.deserialize(value): Person
> // Once you execute this line, you will see ...
> // What!
> // What!
> // res1: Person = Person(abc,1)
> {code}
> Basically, importing a method of a class causes the constructor of that class 
> been called twice.
> It affects our branch 1.0 and master.
> For the master, you can use 
> {code}
> val serializer = org.apache.spark.serializer.Serializer.getSerializer(None)
> {code}
> to get the serializer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to