Olexiy Oryeshko created SPARK-31600:
---------------------------------------

             Summary: Error message from DataFrame creation is misleading.
                 Key: SPARK-31600
                 URL: https://issues.apache.org/jira/browse/SPARK-31600
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.5
         Environment: DataBricks 6.4, Spark 2.4.5, Scala 2.11
            Reporter: Olexiy Oryeshko


*Description:*

DataFrame creation from pandas.DataFrame fails when one of the features 
contains only NaN values (which is ok).

However, error message mentions wrong feature as the culprit, which makes it 
hard to find the root cause.

*How to reproduce:*

 
{code:java}
import numpy as np
import pandas as pd
df2 = pd.DataFrame({'a': np.array([np.nan, np.nan], dtype=np.object_), 'b': 
[np.nan, 'aaa']})
display(spark.createDataFrame(df2[['b']]))   # Works fine
spark.createDataFrame(df2)            # Raises TypeError.
{code}
In the code above, column 'a' is bad. However, the `TypeError` raised in the 
last command mentions feature 'b' as the culprit:

TypeError: field b: Can not merge type <class 'pyspark.sql.types.DoubleType'> 
and <class 'pyspark.sql.types.StringType'>

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to