This is super awesome!
________________________________ From: Shivaram Venkataraman <shiva...@eecs.berkeley.edu> Sent: Saturday, February 9, 2019 8:33 AM To: Hyukjin Kwon Cc: dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram Venkataraman Subject: Re: Vectorized R gapply[Collect]() implementation Those speedups look awesome! Great work Hyukjin! Thanks Shivaram On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: > > Guys, as continuation of Arrow optimization for R DataFrame to Spark > DataFrame, > > I am trying to make a vectorized gapply[Collect] implementation as an > experiment like vectorized Pandas UDFs > > It brought 820%+ performance improvement. See > https://github.com/apache/spark/pull/23746 > > Please come and take a look if you're interested in R APIs :D. I have already > cc'ed some people I know but please come, review and discuss for both Spark > side and Arrow side. > > This Arrow optimization job is being done under > https://issues.apache.org/jira/browse/SPARK-26759 . Please feel free to take > one if anyone of you is interested in it. > > Thanks.