xie shuiahu created SPARK-49950:
-----------------------------------
Summary: `spark.createDataset(0 until 10000000)` is too slow
Key: SPARK-49950
URL: https://issues.apache.org/jira/browse/SPARK-49950
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 3.5.1
Reporter: xie shuiahu
```scala
import spark.implicits._
val data = (0 until 10000000).toArray
val start = System.currentTimeMillis()
spark.createDataset(data) // spend more than 10s in my laptop
println(System.currentTimeMillis - start)
```
This is caused by `LocalRelation`, because `mapExpressions` will go into the
data and spend a lot of time to tranversal it. Any idea to fix this issue?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]