Hi Shahab - if your data structures are small enough a broadcasted Map is
going to provide faster lookup. Lookup within an RDD is an O(m) operation
where m is the size of the partition. For RDDs with multiple partitions,
executors can operate on it in parallel so you get some improvement for
larger RDDs.
On Thu, Feb 19, 2015 at 7:31 AM shahab <shahab.mok...@gmail.com> wrote:

> Hi,
> I am doing lookup on cached RDDs [(Int,String)], and I noticed that the
> lookup is relatively slow 30-100 ms ?? I even tried this on one machine
> with single partition, but no difference!
> The RDDs are not large at all, 3-30 MB.
> Is this expected behaviour? should I use other data structures, like
> HashMap to keep data and look up it there and use Broadcast to send a copy
> to all machines?
> best,
> /Shahab

Reply via email to