okie, i may have found an alternate/workaround to using .collect() for what i am trying to achieve...
initially, for the Spark application that i am working on, i would call .collect() on two separate RDDs into a couple of ArrayLists (which was the reason i was asking what the size limit on the driver is) i need to map the 1st rdd to the 2nd rdd according to a computation/function - resulting in key-value pairs; it turns out, i don't need to call .collect() if i instead use .zipPartitions() - to which i can pass the function to; i am currently testing it out... thanks all for your responses -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org