indexedrdd and radix tree: how to search indexedRDD using all prefixes?

2015-11-24 Thread Mina
Hello, I have a question about radix tree (PART) implementation in Spark, IndexedRDD. I explored the source code and found out that the Radix tree used in IndexedRDD, only returns exact matches. However, it seems to have an restricted use, For example, I want to find children nodes using prefix

Re: indexedrdd and radix tree: how to search indexedRDD using all prefixes?

2015-11-24 Thread Mina
This is what a Radix tree returns -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/indexedrdd-and-radix-tree-how-to-search-indexedRDD-using-all-prefixes-tp25459p25460.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark IndexedRDD dependency in Maven

2015-11-09 Thread Ted Yu
I would suggest asking this question on SPARK-2365 since IndexedRDD has not been released (upstream) Cheers On Mon, Nov 9, 2015 at 1:34 PM, swetha <swethakasire...@gmail.com> wrote: > > Hi , > > What is the appropriate dependency to include for Spark Indexed RDD? I get > c

Spark IndexedRDD dependency in Maven

2015-11-09 Thread swetha
Hi , What is the appropriate dependency to include for Spark Indexed RDD? I get compilation error if I include 0.3 as the version as shown below: amplab spark-indexedrdd 0.3 Thanks, Swetha -- View this message in context: http://apache

INDEXEDRDD in PYSPARK

2015-09-03 Thread shahid ashraf
Hi Folks Any resource to get started using https://github.com/amplab/spark-indexedrdd in pyspark -- with Regards Shahid Ashraf

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-23 Thread Ruslan Dautkhanov
. On Tue, Jul 14, 2015 at 5:44 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at SPARK-2365 which is in progress. On Tue, Jul 14, 2015 at 5:18 PM, swetha swethakasire...@gmail.com wrote: Hi, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread Ted Yu
Please take a look at SPARK-2365 which is in progress. On Tue, Jul 14, 2015 at 5:18 PM, swetha swethakasire...@gmail.com wrote: Hi, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming to do lookups/updates/deletes in RDDs using keys by storing them as key

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread Ted Yu
...@gmail.com wrote: Hi, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming to do lookups/updates/deletes in RDDs using keys by storing them as key/value pairs. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3

Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread swetha
Hi, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming to do lookups/updates/deletes in RDDs using keys by storing them as key/value pairs. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-14 Thread Tathagata Das
...@gmail.com wrote: Please take a look at SPARK-2365 which is in progress. On Tue, Jul 14, 2015 at 5:18 PM, swetha swethakasire...@gmail.com wrote: Hi, Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark Streaming to do lookups/updates/deletes in RDDs using keys

Re: IndexedRDD

2015-01-13 Thread Jerry Lam
Hi guys, I'm interested in the IndexedRDD too. How many rows in the big table that matches the small table in every run? If the number of rows stay constant, then I think Jem wants the runtime to stay about constant (i.e. ~ 0.6 second for all cases). However, I agree with Andrew. The performance

Re: IndexedRDD

2015-01-13 Thread Jem Tucker
time increase at all until the small table was within 1 order of magnitude of the larger. I agree though, the performance is not bad at all! The same join with normal RDDs takes an order of magnitude longer i found, I can share the results tomorrow. I am unsure exactly how the IndexedRDD are indexed

IndexedRDD

2015-01-13 Thread Jem Tucker
Hi, I have been playing around with the indexedRDD ( https://issues.apache.org/jira/browse/SPARK-2365, https://github.com/amplab/spark-indexedrdd) and have been very impressed with its performance. Some performance testing has revealed worse than expected scaling of the join performance*, and I

Re: IndexedRDD

2015-01-13 Thread Andrew Ash
Hi Jem, Linear time in scaling on the big table doesn't seem that surprising to me. What were you expecting? I assume you're doing normalRDD.join(indexedRDD). If you were to replace the indexedRDD with a normal RDD, what times do you get? On Tue, Jan 13, 2015 at 5:35 AM, Jem Tucker jem.tuc