[ 
https://issues.apache.org/jira/browse/SPARK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554609#comment-14554609
 ] 

natalya commented on SPARK-6189:
--------------------------------

Figuring out what is wrong is not the difficulty.  The current error message 
while confusing and humorous, provides sufficient information to track down the 
issue.  

However, if Spark simply returns an error it will remain incompatible with 
certain data sets - for example, URLs, server names, IP addresses, and e-mail 
addresses.  All necessarily will contain a period.  Some small subset will also 
contain underscores.  Both solutions will prohibit direct handling of this type 
of data in field names which seems like a significant restriction, and even 
more so when you factor in the additional restriction on compatibility with R 
and SQL.  

Wouldn't it be better to fix the problem and allow periods?

> Pandas to DataFrame conversion should check field names for periods
> -------------------------------------------------------------------
>
>                 Key: SPARK-6189
>                 URL: https://issues.apache.org/jira/browse/SPARK-6189
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> Issue I ran into:  I imported an R dataset in CSV format into a Pandas 
> DataFrame and then use toDF() to convert that into a Spark DataFrame.  The R 
> dataset had a column with a period in it (column "GNP.deflator" in the 
> "longley" dataset).  When I tried to select it using the Spark DataFrame DSL, 
> I could not because the DSL thought the period was selecting a field within 
> GNP.
> Also, since "GNP" is another field's name, it gives an error which could be 
> obscure to users, complaining:
> {code}
> org.apache.spark.sql.AnalysisException: GetField is not valid on fields of 
> type DoubleType;
> {code}
> We should either handle periods in column names or check during loading and 
> warn/fail gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to