Hello, I'm facing a problem with custom RDD transformations.
I would like to transform a RDD[K, V] into a Map[K, RDD[V]], meaning a map of RDD by key. This would be great, for example, in order to process mllib clustering on V values grouped by K. I know I could do it using filter() on my RDD as many times I have keys, but I'm afraid this would not be efficient (the entire RDD would be read each time, right ?). Then, I could mapByPartition my RDD before filtering, but the code is finally huge... So, I tried to create a CustomRDD to implement a splitByKey(rdd: RDD[K, V]): Map[K, RDD[V]] method, which would iterate on the RDD once time only, but I cannot achieve my development. Please, could you tell me first if this is really faisable, and then, could you give me some pointers ? Thank you, Regards, Sebastien