[
https://issues.apache.org/jira/browse/SPARK-6189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554609#comment-14554609
]
natalya commented on SPARK-6189:
--------------------------------
Figuring out what is wrong is not the difficulty. The current error message
while confusing and humorous, provides sufficient information to track down the
issue.
However, if Spark simply returns an error it will remain incompatible with
certain data sets - for example, URLs, server names, IP addresses, and e-mail
addresses. All necessarily will contain a period. Some small subset will also
contain underscores. Both solutions will prohibit direct handling of this type
of data in field names which seems like a significant restriction, and even
more so when you factor in the additional restriction on compatibility with R
and SQL.
Wouldn't it be better to fix the problem and allow periods?
> Pandas to DataFrame conversion should check field names for periods
> -------------------------------------------------------------------
>
> Key: SPARK-6189
> URL: https://issues.apache.org/jira/browse/SPARK-6189
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
> Priority: Minor
>
> Issue I ran into: I imported an R dataset in CSV format into a Pandas
> DataFrame and then use toDF() to convert that into a Spark DataFrame. The R
> dataset had a column with a period in it (column "GNP.deflator" in the
> "longley" dataset). When I tried to select it using the Spark DataFrame DSL,
> I could not because the DSL thought the period was selecting a field within
> GNP.
> Also, since "GNP" is another field's name, it gives an error which could be
> obscure to users, complaining:
> {code}
> org.apache.spark.sql.AnalysisException: GetField is not valid on fields of
> type DoubleType;
> {code}
> We should either handle periods in column names or check during loading and
> warn/fail gracefully.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]