I've been looking for several solutions but I can't find something efficient to compute many window function efficiently ( optimized computation or efficient parallelism ) Am I the only one interested by this ?
Regards, Julien Le ven. 15 déc. 2017 à 21:34, Julien CHAMP <jch...@tellmeplus.com> a écrit : > May be I should consider something like impala ? > > Le ven. 15 déc. 2017 à 11:32, Julien CHAMP <jch...@tellmeplus.com> a > écrit : > >> Hi Spark Community members ! >> >> I want to do several ( from 1 to 10) aggregate functions using window >> functions on something like 100 columns. >> >> Instead of doing several pass on the data to compute each aggregate >> function, is there a way to do this efficiently ? >> >> >> >> Currently it seems that doing >> >> >> val tw = >> Window >> .orderBy("date") >> .partitionBy("id") >> .rangeBetween(-8035200000L, 0) >> >> and then >> >> x >> .withColumn("agg1", max("col").over(tw)) >> .withColumn("agg2", min("col").over(tw)) >> .withColumn("aggX", avg("col").over(tw)) >> >> >> Is not really efficient :/ >> It seems that it iterates on the whole column for each aggregation ? Am I >> right ? >> >> Is there a way to compute all the required operations on a columns with a >> single pass ? >> Event better, to compute all the required operations on ALL columns with >> a single pass ? >> >> Thx for your Future[Answers] >> >> Julien >> >> >> >> >> >> -- >> >> >> Julien CHAMP — Data Scientist >> >> >> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : >> **jch...@tellmeplus.com >> <jch...@tellmeplus.com>* >> >> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* >> <https://www.linkedin.com/in/julienchamp> >> >> TellMePlus S.A — Predictive Objects >> >> *Paris* : 7 rue des Pommerots, 78400 Chatou >> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière >> > -- > > > Julien CHAMP — Data Scientist > > > *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : > **jch...@tellmeplus.com > <jch...@tellmeplus.com>* > > *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* > <https://www.linkedin.com/in/julienchamp> > > TellMePlus S.A — Predictive Objects > > *Paris* : 7 rue des Pommerots, 78400 Chatou > *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière > -- Julien CHAMP — Data Scientist *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : **jch...@tellmeplus.com <jch...@tellmeplus.com>* *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* : *here* <https://www.linkedin.com/in/julienchamp> TellMePlus S.A — Predictive Objects *Paris* : 7 rue des Pommerots, 78400 Chatou *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière -- Ce message peut contenir des informations confidentielles ou couvertes par le secret professionnel, à l’intention de son destinataire. Si vous n’en êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer toute copie. This email may contain confidential and/or privileged information for the intended recipient. If you are not the intended recipient, please contact the sender and delete all copies. -- <http://www.tellmeplus.com/assets/emailing/banner.html>