[jira] [Resolved] (SPARK-11758) Missing Index column while creating a DataFrame from Pandas

Hyukjin Kwon (JIRA) Mon, 20 May 2019 21:46:37 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-11758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-11758.
----------------------------------
    Resolution: Incomplete

> Missing Index column while creating a DataFrame from Pandas 
> ------------------------------------------------------------
>
>                 Key: SPARK-11758
>                 URL: https://issues.apache.org/jira/browse/SPARK-11758
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.5.1
>         Environment: Linux Debian, PySpark, in local testing.
>            Reporter: Leandro Ferrado
>            Priority: Minor
>              Labels: bulk-closed
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> In PySpark's SQLContext, when it invokes createDataFrame() from a 
> pandas.DataFrame and indicating a 'schema' with StructFields, the function 
> _createFromLocal() converts the pandas.DataFrame but ignoring two points:
> - Index column, because the flag index=False
> - Timestamp's records, because a Date column can't be index and Pandas 
> doesn't converts its records in Timestamp's type.
> So, converting a DataFrame from Pandas to SQL is poor in scenarios with 
> temporal records.
> Doc: 
> http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_records.html
> Affected code:
> def _createFromLocal(self, data, schema):
>         """
>         Create an RDD for DataFrame from an list or pandas.DataFrame, returns
>         the RDD and schema.
>         """
>         if has_pandas and isinstance(data, pandas.DataFrame):
>             if schema is None:
>                 schema = [str(x) for x in data.columns]
>             data = [r.tolist() for r in data.to_records(index=False)]  # HERE
>         # ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-11758) Missing Index column while creating a DataFrame from Pandas

Reply via email to