AFAIK the physical distribution is not exposed in the public API; the closest I can think of is `rdd.coalesce(numPhysicalNodes).mapPartitions(...` but this assumes that one partition exists per node
On Fri, Sep 18, 2015 at 4:09 PM, Ulanov, Alexander <alexander.ula...@hpe.com > wrote: > Thank you! How can I guarantee that I have only one element per executor > (per worker, or per physical node)? > > > > *From:* Feynman Liang [mailto:fli...@databricks.com] > *Sent:* Friday, September 18, 2015 4:06 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: One element per node > > > > rdd.mapPartitions(x => new Iterator(x.head)) > > > > On Fri, Sep 18, 2015 at 3:57 PM, Ulanov, Alexander < > alexander.ula...@hpe.com> wrote: > > Dear Spark developers, > > > > Is it possible (and how to do it if possible) to pick one element per > physical node from an RDD? Let’s say the first element of any partition on > that node. The result would be an RDD[element], the count of elements is > equal to the N of nodes that has partitions of the initial RDD. > > > > Best regards, Alexander > > >