[ 
https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Wilkinson updated SPARK-23978:
--------------------------------------
       Priority: Minor  (was: Major)
    Description: 
Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
classes to the kryo registration, but it does this via class.forName.

If the mllib jar is not on the classpath, this can be very slow.

My app, which is using GraphX connected components function is 2x slower in 2.3 
than 2.2.1

I have attached jVisualVM stats for both cases; you can see a vast amount of 
time is spent in Utils.classForName.  While debugging, i traced this to the 
Kryo initialization

  was:
Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
classes to the kryo registration, but it does this via class.forName.

If the mllib jar is not on the classpath, this can be very slow.

My app, which is using GraphX connected components function is 2x slower in 2.3 
than 2.2.1

 


> Kryo much slower when mllib jar not on classpath
> ------------------------------------------------
>
>                 Key: SPARK-23978
>                 URL: https://issues.apache.org/jira/browse/SPARK-23978
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>         Environment: Windows 10, Java 8
>            Reporter: Richard Wilkinson
>            Priority: Minor
>         Attachments: kryo_stats.png
>
>
> Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib 
> classes to the kryo registration, but it does this via class.forName.
> If the mllib jar is not on the classpath, this can be very slow.
> My app, which is using GraphX connected components function is 2x slower in 
> 2.3 than 2.2.1
> I have attached jVisualVM stats for both cases; you can see a vast amount of 
> time is spent in Utils.classForName.  While debugging, i traced this to the 
> Kryo initialization



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to