Re: Dataframe's .drop in PySpark doesn't accept Column

Reynold Xin Sat, 30 May 2015 00:55:49 -0700

Yea would be great to support a Column. Can you create a JIRA, and possibly
a pull request?



On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> Actually, the Scala API too is only based on column name
>
> Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> Hi,
>> Testing a bit more 1.4, it seems that the .drop() method in PySpark
>> doesn't seem to accept a Column as input datatype :
>>
>>
>> *    .join(only_the_best, only_the_best.pol_no == df.pol_no,
>> "inner").drop(only_the_best.pol_no)\* File
>> "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line
>> 1225, in drop
>> jdf = self._jdf.drop(colName)
>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
>> 523, in __call__
>> (new_args, temp_args) = self._get_args(args)
>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line
>> 510, in _get_args
>> temp_arg = converter.convert(arg, self.gateway_client)
>> File "/usr/local/lib/python2.7/site-packages/py4j/java_collections.py",
>> line 490, in convert
>> for key in object.keys():
>> TypeError: 'Column' object is not callable
>>
>> It doesn't seem very consistent with rest of the APIs - and is especially
>> annoying when executing joins - because drop("my_key") is not a qualified
>> reference to the column.
>>
>> What do you think about changing that ? or what is the best practice as a
>> workaround ?
>>
>> Regards,
>>
>> Olivier.
>>
>

Re: Dataframe's .drop in PySpark doesn't accept Column

Reply via email to