done : https://github.com/apache/spark/pull/5683 and https://issues.apache.org/jira/browse/SPARK-7118 thx
Le ven. 24 avr. 2015 à 07:34, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > I'll try thanks > > Le ven. 24 avr. 2015 à 00:09, Reynold Xin <r...@databricks.com> a écrit : > >> You can do it similar to the way countDistinct is done, can't you? >> >> >> https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78 >> >> >> >> On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot < >> o.girar...@lateral-thoughts.com> wrote: >> >>> I found another way setting a SPARK_HOME on a released version and >>> launching an ipython to load the contexts. >>> I may need your insight however, I found why it hasn't been done at the >>> same time, this method (like some others) uses a varargs in Scala and for >>> now the way functions are called only one parameter is supported. >>> >>> So at first I tried to just generalise the helper function "_" in the >>> functions.py file to multiple arguments, but py4j's handling of varargs >>> forces me to create an Array[Column] if the target method is expecting >>> varargs. >>> >>> But from Python's perspective, we have no idea of whether the target >>> method will be expecting varargs or just multiple arguments (to un-tuple). >>> I can create a special case for "coalesce" or "for method that takes of >>> list of columns as arguments" considering they will be varargs based (and >>> therefore needs an Array[Column] instead of just a list of arguments) >>> >>> But this seems very specific and very prone to future mistakes. >>> Is there any way in Py4j to know before calling it the signature of a >>> method ? >>> >>> >>> Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot < >>> o.girar...@lateral-thoughts.com> a écrit : >>> >>>> What is the way of testing/building the pyspark part of Spark ? >>>> >>>> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot < >>>> o.girar...@lateral-thoughts.com> a écrit : >>>> >>>>> yep :) I'll open the jira when I've got the time. >>>>> Thanks >>>>> >>>>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin <r...@databricks.com> a >>>>> écrit : >>>>> >>>>>> Ah damn. We need to add it to the Python list. Would you like to give >>>>>> it a shot? >>>>>> >>>>>> >>>>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot < >>>>>> o.girar...@lateral-thoughts.com> wrote: >>>>>> >>>>>>> Yep no problem, but I can't seem to find the coalesce fonction in >>>>>>> pyspark.sql.{*, functions, types or whatever :) } >>>>>>> >>>>>>> Olivier. >>>>>>> >>>>>>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot < >>>>>>> o.girar...@lateral-thoughts.com> a écrit : >>>>>>> >>>>>>> > a UDF might be a good idea no ? >>>>>>> > >>>>>>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot < >>>>>>> > o.girar...@lateral-thoughts.com> a écrit : >>>>>>> > >>>>>>> >> Hi everyone, >>>>>>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the >>>>>>> *fillna* API >>>>>>> >> in PySpark, is there any efficient alternative to mapping the >>>>>>> records >>>>>>> >> myself ? >>>>>>> >> >>>>>>> >> Regards, >>>>>>> >> >>>>>>> >> Olivier. >>>>>>> >> >>>>>>> > >>>>>>> >>>>>> >>>>>> >>