Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization.
It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster. Looks working fine so far; however, I would appreciate if you guys have some time to take a look (https://github.com/apache/spark/pull/22954) so that we can directly go ahead as soon as R API of Arrow is released. More importantly, I want some more people who're more into Arrow R API side but also interested in Spark side. I have already cc'ed some people I know but please come, review and discuss for both Spark side and Arrow side. Thanks.