xie shuiahu created SPARK-49950:
-----------------------------------

             Summary: `spark.createDataset(0 until 10000000)` is too slow
                 Key: SPARK-49950
                 URL: https://issues.apache.org/jira/browse/SPARK-49950
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.5.1
            Reporter: xie shuiahu


```scala

import spark.implicits._

val data = (0 until 10000000).toArray

val start = System.currentTimeMillis()

spark.createDataset(data)  // spend more than 10s in my laptop

println(System.currentTimeMillis - start)

```

 

This is caused by `LocalRelation`, because `mapExpressions` will go into the 
data and spend a lot of time to tranversal it. Any idea to fix this issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to