Re: Dataframe's .drop in PySpark doesn't accept Column

Yijie Shen Sat, 30 May 2015 09:39:06 -0700

I think just match the Column’s expr as UnresolvedAttribute and use 
UnresolvedAttribute’s name to match schema’s field name is enough.


Seems no need to regard expr as a more general one. :)

On May 30, 2015 at 11:14:05 PM, Girardot Olivier 
(o.girar...@lateral-thoughts.com) wrote:

Jira done : https://issues.apache.org/jira/browse/SPARK-7969
I've already started working on it but it's less trivial than it seems because 
I don't exactly now the inner workings of the catalog, 
and how to get the qualified name of a column to match it against the 
schema/catalog.

Regards, 

Olivier.

Le sam. 30 mai 2015 à 09:54, Reynold Xin <r...@databricks.com> a écrit :
Yea would be great to support a Column. Can you create a JIRA, and possibly a 
pull request?


On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot 
<o.girar...@lateral-thoughts.com> wrote:
Actually, the Scala API too is only based on column name

Le ven. 29 mai 2015 à 11:23, Olivier Girardot <o.girar...@lateral-thoughts.com> 
a écrit :
Hi, 
Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't 
seem to accept a Column as input datatype : 

    .join(only_the_best, only_the_best.pol_no == df.pol_no, 
"inner").drop(only_the_best.pol_no)\
File "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 
1225, in drop
jdf = self._jdf.drop(colName)
File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line 523, 
in __call__
(new_args, temp_args) = self._get_args(args)
File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line 510, 
in _get_args
temp_arg = converter.convert(arg, self.gateway_client)
File "/usr/local/lib/python2.7/site-packages/py4j/java_collections.py", line 
490, in convert
for key in object.keys():
TypeError: 'Column' object is not callable

It doesn't seem very consistent with rest of the APIs - and is especially 
annoying when executing joins - because drop("my_key") is not a qualified 
reference to the column.

What do you think about changing that ? or what is the best practice as a 
workaround ?

Regards, 

Olivier.

Re: Dataframe's .drop in PySpark doesn't accept Column

Reply via email to