Re: Dataframe's .drop in PySpark doesn't accept Column

Reynold Xin Sat, 30 May 2015 11:42:51 -0700

Name resolution is not as easy I think.  Wenchen can maybe give you some
advice on resolution about this one.



On Sat, May 30, 2015 at 9:37 AM, Yijie Shen <[email protected]>
wrote:

> I think just match the Column’s expr as UnresolvedAttribute and use
> UnresolvedAttribute’s name to match schema’s field name is enough.
>
> Seems no need to regard expr as a more general one. :)
>
> On May 30, 2015 at 11:14:05 PM, Girardot Olivier (
> [email protected]) wrote:
>
> Jira done : https://issues.apache.org/jira/browse/SPARK-7969
> I've already started working on it but it's less trivial than it seems
> because I don't exactly now the inner workings of the catalog,
> and how to get the qualified name of a column to match it against the
> schema/catalog.
>
> Regards,
>
> Olivier.
>
>  Le sam. 30 mai 2015 à 09:54, Reynold Xin <[email protected]> a écrit :
>
>> Yea would be great to support a Column. Can you create a JIRA, and
>> possibly a pull request?
>>
>>
>> On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
>> [email protected]> wrote:
>>
>>> Actually, the Scala API too is only based on column name
>>>
>>>  Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
>>> [email protected]> a écrit :
>>>
>>>> Hi,
>>>> Testing a bit more 1.4, it seems that the .drop() method in PySpark
>>>> doesn't seem to accept a Column as input datatype :
>>>>
>>>>
>>>> *    .join(only_the_best, only_the_best.pol_no == df.pol_no,
>>>> "inner").drop(only_the_best.pol_no)\* File
>>>> "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line
>>>> 1225, in drop
>>>> jdf = self._jdf.drop(colName)
>>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py",
>>>> line 523, in __call__
>>>> (new_args, temp_args) = self._get_args(args)
>>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py",
>>>> line 510, in _get_args
>>>> temp_arg = converter.convert(arg, self.gateway_client)
>>>> File "/usr/local/lib/python2.7/site-packages/py4j/java_collections.py",
>>>> line 490, in convert
>>>> for key in object.keys():
>>>> TypeError: 'Column' object is not callable
>>>>
>>>> It doesn't seem very consistent with rest of the APIs - and is
>>>> especially annoying when executing joins - because drop("my_key") is not a
>>>> qualified reference to the column.
>>>>
>>>> What do you think about changing that ? or what is the best practice as
>>>> a workaround ?
>>>>
>>>> Regards,
>>>>
>>>> Olivier.
>>>>
>>>
>>

Re: Dataframe's .drop in PySpark doesn't accept Column

Reply via email to