[ 
https://issues.apache.org/jira/browse/SPARK-53959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033871#comment-18033871
 ] 

Dongjoon Hyun commented on SPARK-53959:
---------------------------------------

Hi, [~khakhlyuk]. 

Apache Spark community has a policy which manages `Fix Version` and `Target 
Version` like the following. So, please don't set it when you file a JIRA issue.
https://spark.apache.org/contributing.html

{quote}
Do not set the following fields:
- Fix Version. This is assigned by committers only when resolved.
- Target Version. This is assigned by committers to indicate a PR has been 
accepted for possible fix by the target version.
{quote}

> Spark Connect Python client does not throw a proper error when creating a 
> dataframe from an empty pandas dataframe
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-53959
>                 URL: https://issues.apache.org/jira/browse/SPARK-53959
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect, PySpark
>    Affects Versions: 4.1.0
>            Reporter: Alex Khakhlyuk
>            Priority: Major
>              Labels: pull-request-available
>
> Spark Connect Python client does not throw a proper error when creating a 
> dataframe from a pandas dataframe with a index and empty data.
> Generally, spark connect client throws a client-side error 
> `[CANNOT_INFER_EMPTY_SCHEMA] Can not infer schema from an empty dataset`. 
> when creating a dataframe without data, for example via
> {quote}spark.createDataFrame([]).show()
> {quote}
> or
> {quote}df = pd.DataFrame()
> spark.createDataFrame(df).show(){quote}
> or
> {quote}df = pd.DataFrame(\{"a": []})
> spark.createDataFrame(df).show(){quote}
> This does not happen when pandas dataframe has an index but no data, e.g.
> {quote}df = pd.DataFrame(index=range(5))
> spark.createDataFrame(df).show(){quote}
> What happens instead is that the dataframe is successfully converted to a 
> LocalRelation on the client, is sent to the server, but the server then 
> throws the following exception: `INTERNAL_ERROR: Input data for LocalRelation 
> does not produce a schema. SQLSTATE: XX000`. XX000 is an internal error sql 
> state and the error is not actionable enough for the user.
> This should be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to