Re: IndexedRDD

Andrew Ash Tue, 13 Jan 2015 08:09:33 -0800

Hi Jem,

Linear time in scaling on the big table doesn't seem that surprising to
me.  What were you expecting?


I assume you're doing normalRDD.join(indexedRDD).  If you were to replace
the indexedRDD with a normal RDD, what times do you get?

On Tue, Jan 13, 2015 at 5:35 AM, Jem Tucker <jem.tuc...@gmail.com> wrote:

> Hi,
>
> I have been playing around with the indexedRDD (
> https://issues.apache.org/jira/browse/SPARK-2365,
> https://github.com/amplab/spark-indexedrdd) and have been very impressed
> with its performance. Some performance testing has revealed worse than
> expected scaling of the join performance*, and I was just wondering if
> anyone else has any experience using it and what they have found?
>
> Thanks,
>
> Jem
>
> *Table below shows some of my results when joining a small RDD to a large
> IndexedRDD.  Each table consisted of a Long key and 15 character String
> value. Shows an almost linear time increase with the number of rows in the
> bigger table.
>
> Small Table Rows
>
>  Big Table Rows
>
> Time
>
> (s)
>
> 50000
>
> 10000000
>
> 0.6
>
> 50000
>
> 50000000
>
> 0.8
>
> 50000
>
> 100000000
>
> 1.5
>
> 50000
>
> 150000000
>
> 2.1
>
> 50000
>
> 200000000
>
> 2.8
>
> 50000
>
> 500000000
>
> 7.2
>
> 50000
>
> 1000000000
>
> 12.2
>

Re: IndexedRDD

Reply via email to