[jira] [Commented] (SPARK-13455) Periods in dataframe column names breaks df.drop()

Jason Piper (JIRA) Tue, 23 Feb 2016 07:10:50 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159013#comment-15159013
 ]


Jason Piper commented on SPARK-13455:
-------------------------------------

Ah, thanks! Appears to have been a period/dot nomenclature issue when I was 
searching (which is ironic as I'm a Brit so I call them a dot!)

> Periods in dataframe column names breaks df.drop(<string>)
> ----------------------------------------------------------
>
>                 Key: SPARK-13455
>                 URL: https://issues.apache.org/jira/browse/SPARK-13455
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.6.0
>         Environment: Spark 1.6.0 installed via homebrew
>            Reporter: Jason Piper
>            Priority: Minor
>
> When calling the .drop method using a string on a dataframe that contains a 
> column name with a period in it, an AnalysisException is raised. This doesn't 
> happen when dropping using the column object itself.
> {code}
> >>> import json
> >>> ds = {'a': "test", "b.no": "testagain"}
> >>> df = sqlContext.jsonRDD(sc.parallelize([json.dumps(ds)]))
> >>> df.drop('a')
> {code}
> yields
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/pyspark/sql/dataframe.py",
>  line 1347, in drop
>     jdf = self._jdf.drop(col)
>   File 
> "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>  line 813, in __call__
>   File 
> "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/pyspark/sql/utils.py", 
> line 51, in deco
>     raise AnalysisException(s.split(': ', 1)[1], stackTrace)
> pyspark.sql.utils.AnalysisException: u"cannot resolve 'b.no' given input 
> columns a, b.no;"
> {code}
> whereas this works,
> {code}
> >>> df.drop(df.a)
> DataFrame[b.no: string]
> {code}
> current workaround if you want to drop a column using a string is to use
> {code}
> >>> df.drop(df.select("a")[0])
> DataFrame[b.no: string]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-13455) Periods in dataframe column names breaks df.drop()

Reply via email to