https://issues.apache.org/jira/browse/SPARK-3489

Folks,
I am Mohit Jaggi and I work for Ayasdi Inc. After experimenting with Spark
for  a while and discovering its awesomeness(!) I made an attempt to
provide a wrapper API that looks like R and/or pandas dataframe.

https://github.com/AyasdiOpenSource/df

"df" uses a collection of RDDs, each element in the collection being a
column in a dataframe. To make rows from the columns I used zip() in a loop
but that is not very efficient. I created JIRA 3489 requesting a zip()
variant that zips a sequence of RDDs. I noticed that it was easy to write
that code so I wrote that code and it seems to work. I attached the diff to
the jira. I believe that this API would be useful in general and is not
specific to "df". Please take a look at the request and the proposed
solution and let me know what you think.

Cheers,
Mohit

Reply via email to