Alex Khakhlyuk created SPARK-53959:
--------------------------------------

             Summary: Spark Connect Python client does not throw a proper error 
when creating a dataframe from an empty pandas dataframe
                 Key: SPARK-53959
                 URL: https://issues.apache.org/jira/browse/SPARK-53959
             Project: Spark
          Issue Type: Bug
          Components: Connect, PySpark
    Affects Versions: 4.1.0
            Reporter: Alex Khakhlyuk
             Fix For: 4.1.0


Spark Connect Python client does not throw a proper error when creating a 
dataframe from a pandas dataframe with a index and empty data.

Generally, spark connect client throws a client-side error 
`[CANNOT_INFER_EMPTY_SCHEMA] Can not infer schema from an empty dataset`. when 
creating a dataframe without data, for example via
{quote}spark.createDataFrame([]).show()
{quote}
or
{quote}df = pd.DataFrame()
spark.createDataFrame(df).show(){quote}
or
{quote}df = pd.DataFrame(\{"a": []})
spark.createDataFrame(df).show(){quote}
This does not happen when pandas dataframe has an index but no data, e.g.
{quote}df = pd.DataFrame(index=range(5))
spark.createDataFrame(df).show(){quote}
What happens instead is that the dataframe is successfully converted to a 
LocalRelation on the client, is sent to the server, but the server then throws 
the following exception: `INTERNAL_ERROR: Input data for LocalRelation does not 
produce a schema. SQLSTATE: XX000`. XX000 is an internal error sql state and 
the error is not actionable enough for the user.
This should be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to