AFAIK the physical distribution is not exposed in the public API; the
closest I can think of is
`rdd.coalesce(numPhysicalNodes).mapPartitions(...` but this assumes that
one partition exists per node

On Fri, Sep 18, 2015 at 4:09 PM, Ulanov, Alexander <alexander.ula...@hpe.com
> wrote:

> Thank you! How can I guarantee that I have only one element per executor
> (per worker, or per physical node)?
>
>
>
> *From:* Feynman Liang [mailto:fli...@databricks.com]
> *Sent:* Friday, September 18, 2015 4:06 PM
> *To:* Ulanov, Alexander
> *Cc:* dev@spark.apache.org
> *Subject:* Re: One element per node
>
>
>
> rdd.mapPartitions(x => new Iterator(x.head))
>
>
>
> On Fri, Sep 18, 2015 at 3:57 PM, Ulanov, Alexander <
> alexander.ula...@hpe.com> wrote:
>
> Dear Spark developers,
>
>
>
> Is it possible (and how to do it if possible) to pick one element per
> physical node from an RDD? Let’s say the first element of any partition on
> that node. The result would be an RDD[element], the count of elements is
> equal to the N of nodes that has partitions of the initial RDD.
>
>
>
> Best regards, Alexander
>
>
>

Reply via email to