Jira done : https://issues.apache.org/jira/browse/SPARK-7969 I've already started working on it but it's less trivial than it seems because I don't exactly now the inner workings of the catalog, and how to get the qualified name of a column to match it against the schema/catalog.
Regards, Olivier. Le sam. 30 mai 2015 à 09:54, Reynold Xin <r...@databricks.com> a écrit : > Yea would be great to support a Column. Can you create a JIRA, and > possibly a pull request? > > > On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot < > o.girar...@lateral-thoughts.com> wrote: > >> Actually, the Scala API too is only based on column name >> >> Le ven. 29 mai 2015 à 11:23, Olivier Girardot < >> o.girar...@lateral-thoughts.com> a écrit : >> >>> Hi, >>> Testing a bit more 1.4, it seems that the .drop() method in PySpark >>> doesn't seem to accept a Column as input datatype : >>> >>> >>> * .join(only_the_best, only_the_best.pol_no == df.pol_no, >>> "inner").drop(only_the_best.pol_no)\* File >>> "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line >>> 1225, in drop >>> jdf = self._jdf.drop(colName) >>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line >>> 523, in __call__ >>> (new_args, temp_args) = self._get_args(args) >>> File "/usr/local/lib/python2.7/site-packages/py4j/java_gateway.py", line >>> 510, in _get_args >>> temp_arg = converter.convert(arg, self.gateway_client) >>> File "/usr/local/lib/python2.7/site-packages/py4j/java_collections.py", >>> line 490, in convert >>> for key in object.keys(): >>> TypeError: 'Column' object is not callable >>> >>> It doesn't seem very consistent with rest of the APIs - and is >>> especially annoying when executing joins - because drop("my_key") is not a >>> qualified reference to the column. >>> >>> What do you think about changing that ? or what is the best practice as >>> a workaround ? >>> >>> Regards, >>> >>> Olivier. >>> >> >