[
https://issues.apache.org/jira/browse/SPARK-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505362#comment-14505362
]
Reynold Xin edited comment on SPARK-7035 at 4/21/15 6:03 PM:
-------------------------------------------------------------
It'd be great to understand the delta between this and Pandas, and do something
accordingly.
Is the problem that Pandas simply patched the current object, while we are
using getattr? If yes, maybe the Pandas way is better and we should switch to
it. However, my Python knowledge is limited and would be great for others to
chime in.
was (Author: rxin):
It'd be great to understand the delta between this and Pandas, and do something
accordingly.
> Drop __getattr__ on pyspark.sql.DataFrame
> -----------------------------------------
>
> Key: SPARK-7035
> URL: https://issues.apache.org/jira/browse/SPARK-7035
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 1.4.0
> Reporter: Kalle Jepsen
>
> I think the {{\_\_getattr\_\_}} method on the DataFrame should be removed.
> There is no point in having the possibility to address the DataFrames columns
> as {{df.column}}, other than the questionable goal to please R developers.
> And it seems R people can use Spark from their native API in the future.
> I see the following problems with {{\_\_getattr\_\_}} for column selection:
> * It's un-pythonic: There should only be one obvious way to solve a problem,
> and we can already address columns on a DataFrame via the {{\_\_getitem\_\_}}
> method, which in my opinion is by far superior and a lot more intuitive.
> * It leads to confusing Exceptions. When we mistype a method-name the
> {{AttributeError}} will say 'No such column ... '.
> * And most importantly: we cannot load DataFrames that have columns with the
> same name as any attribute on the DataFrame-object. Imagine having a
> DataFrame with a column named {{cache}} or {{filter}}. Calling {{df.cache()}}
> will be ambiguous and lead to broken code.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]