[ https://issues.apache.org/jira/browse/SPARK-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen resolved SPARK-5224. ------------------------------- Resolution: Fixed Fix Version/s: 1.2.1 1.3.0 Issue resolved by pull request 4024 [https://github.com/apache/spark/pull/4024] > parallelize list/ndarray is really slow > --------------------------------------- > > Key: SPARK-5224 > URL: https://issues.apache.org/jira/browse/SPARK-5224 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.2.0 > Reporter: Davies Liu > Priority: Blocker > Fix For: 1.3.0, 1.2.1 > > > After the default batchSize changed to 0 (batched based on the size of > object), but parallelize() still use BatchedSerializer with batchSize=1. > Also, BatchedSerializer did not work well with list and numpy.ndarray -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org