[
https://issues.apache.org/jira/browse/SPARK-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506555#comment-14506555
]
Reynold Xin commented on SPARK-7035:
------------------------------------
You brought up 3 points. I just ran some examples in Pandas to see the behavior.
1. Un-pythonic: Not sure if it is very un-pythonic if Pandas supports it.
2. Confusing exceptions that say "No such column ...". I think we should make
it less confusing by following Pandas' exception message, which says
"'DataFrame' object has no attribute 'asdfasdf'"
3. In Pandas, column only sets attributes if there is no existing attributes.
e.g. it is ok to have a column named count.
> Drop __getattr__ on pyspark.sql.DataFrame
> -----------------------------------------
>
> Key: SPARK-7035
> URL: https://issues.apache.org/jira/browse/SPARK-7035
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 1.4.0
> Reporter: Kalle Jepsen
>
> I think the {{\_\_getattr\_\_}} method on the DataFrame should be removed.
> There is no point in having the possibility to address the DataFrames columns
> as {{df.column}}, other than the questionable goal to please R developers.
> And it seems R people can use Spark from their native API in the future.
> I see the following problems with {{\_\_getattr\_\_}} for column selection:
> * It's un-pythonic: There should only be one obvious way to solve a problem,
> and we can already address columns on a DataFrame via the {{\_\_getitem\_\_}}
> method, which in my opinion is by far superior and a lot more intuitive.
> * It leads to confusing Exceptions. When we mistype a method-name the
> {{AttributeError}} will say 'No such column ... '.
> * And most importantly: we cannot load DataFrames that have columns with the
> same name as any attribute on the DataFrame-object. Imagine having a
> DataFrame with a column named {{cache}} or {{filter}}. Calling {{df.cache()}}
> will be ambiguous and lead to broken code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]