[
https://issues.apache.org/jira/browse/SPARK-39494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xinrong Meng updated SPARK-39494:
---------------------------------
Description:
{{Currently, DataFrame creation from a list of scalars is unsupported in
PySpark, for example,}}
{{>>> spark.createDataFrame([1, 2]).collect()}}
{{Traceback (most recent call last):}}
{{...}}
{{TypeError: Can not infer schema for type: <class 'int'>}}
{{However, Spark DataFrame Scala API supports that:}}
{{scala> Seq(1, 2).toDF().collect()}}
{{res6: Array[org.apache.spark.sql.Row] = Array([1], [2])}}
To maintain API consistency, we propose to support DataFrame creation from a
list of scalars.
See more
[here]([https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing]).
was:
Currently, DataFrame creation from a list of scalars is unsupported in PySpark,
for example,
```py
>>> spark.createDataFrame([1, 2]).collect()
Traceback (most recent call last):
...
TypeError: Can not infer schema for type: <class 'int'>
```
However, Spark DataFrame Scala API supports that:
```
scala> Seq(1, 2).toDF().collect()
res6: Array[org.apache.spark.sql.Row] = Array([1], [2])
```
To maintain API consistency, we propose to support DataFrame creation from a
list of scalars.
See more
[here](https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing).
> Support `createDataFrame` from a list of scalars when schema is not provided
> ----------------------------------------------------------------------------
>
> Key: SPARK-39494
> URL: https://issues.apache.org/jira/browse/SPARK-39494
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 3.4.0
> Reporter: Xinrong Meng
> Priority: Major
>
> {{Currently, DataFrame creation from a list of scalars is unsupported in
> PySpark, for example,}}
> {{>>> spark.createDataFrame([1, 2]).collect()}}
> {{Traceback (most recent call last):}}
> {{...}}
> {{TypeError: Can not infer schema for type: <class 'int'>}}
> {{However, Spark DataFrame Scala API supports that:}}
> {{scala> Seq(1, 2).toDF().collect()}}
> {{res6: Array[org.apache.spark.sql.Row] = Array([1], [2])}}
> To maintain API consistency, we propose to support DataFrame creation from a
> list of scalars.
> See more
> [here]([https://docs.google.com/document/d/1Rd20PVbVxNrLfOmDtetVRxkgJQhgAAtJp6XAAZfGQgc/edit?usp=sharing]).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]