Re: Dataframe's .drop in PySpark doesn't accept Column

Olivier Girardot Sat, 30 May 2015 08:14:47 -0700

Jira done : https://issues.apache.org/jira/browse/SPARK-7969
I've already started working on it but it's less trivial than it seems
because I don't exactly now the inner workings of the catalog,
and how to get the qualified name of a column to match it against the
schema/catalog.


Regards,

Olivier.

Le sam. 30 mai 2015 à 09:54, Reynold Xin <r...@databricks.com> a écrit :

> Yea would be great to support a Column. Can you create a JIRA, and
> possibly a pull request?
>
>
> On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Actually, the Scala API too is only based on column name
>>
>> Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>>> Hi,
>>> Testing a bit more 1.4, it seems that the .drop() method in PySpark
>>> doesn't seem to accept a Column as input datatype :
>>>
>>>
>>> *    .join(only_the_best, only_the_best.pol_no == df.pol_no,
>>> "inner").drop(only_the_best.pol_no)\* File
>>> "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line
>>> 1225, in drop
>>> jdf = self._jdf.drop(colName)
>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
>>> 523, in __call__
>>> (new_args, temp_args) = self._get_args(args)
>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
>>> 510, in _get_args
>>> temp_arg = converter.convert(arg, self.gateway_client)
>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_collections.py",
>>> line 490, in convert
>>> for key in object.keys():
>>> TypeError: 'Column' object is not callable
>>>
>>> It doesn't seem very consistent with rest of the APIs - and is
>>> especially annoying when executing joins - because drop("my_key") is not a
>>> qualified reference to the column.
>>>
>>> What do you think about changing that ? or what is the best practice as
>>> a workaround ?
>>>
>>> Regards,
>>>
>>> Olivier.
>>>
>>
>

Re: Dataframe's .drop in PySpark doesn't accept Column

Reply via email to