https://issues.apache.org/jira/browse/SPARK-3489
Folks, I am Mohit Jaggi and I work for Ayasdi Inc. After experimenting with Spark for a while and discovering its awesomeness(!) I made an attempt to provide a wrapper API that looks like R and/or pandas dataframe. https://github.com/AyasdiOpenSource/df "df" uses a collection of RDDs, each element in the collection being a column in a dataframe. To make rows from the columns I used zip() in a loop but that is not very efficient. I created JIRA 3489 requesting a zip() variant that zips a sequence of RDDs. I noticed that it was easy to write that code so I wrote that code and it seems to work. I attached the diff to the jira. I believe that this API would be useful in general and is not specific to "df". Please take a look at the request and the proposed solution and let me know what you think. Cheers, Mohit