[jira] [Updated] (SPARK-6189) Pandas to DataFrame conversion should check field names for periods

Michael Armbrust (JIRA) Mon, 03 Aug 2015 12:27:23 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Armbrust updated SPARK-6189:
------------------------------------
    Target Version/s: 1.6.0  (was: 1.5.0)

> Pandas to DataFrame conversion should check field names for periods
> -------------------------------------------------------------------
>
>                 Key: SPARK-6189
>                 URL: https://issues.apache.org/jira/browse/SPARK-6189
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> Issue I ran into:  I imported an R dataset in CSV format into a Pandas 
> DataFrame and then use toDF() to convert that into a Spark DataFrame.  The R 
> dataset had a column with a period in it (column "GNP.deflator" in the 
> "longley" dataset).  When I tried to select it using the Spark DataFrame DSL, 
> I could not because the DSL thought the period was selecting a field within 
> GNP.
> Also, since "GNP" is another field's name, it gives an error which could be 
> obscure to users, complaining:
> {code}
> org.apache.spark.sql.AnalysisException: GetField is not valid on fields of 
> type DoubleType;
> {code}
> We should either handle periods in column names or check during loading and 
> warn/fail gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-6189) Pandas to DataFrame conversion should check field names for periods

Reply via email to