Re: Merging two Spark SQL tables?

Evan Chan Thu, 21 Aug 2014 07:53:06 -0700

SO I tried the above (why doesn't union or ++ have the same behavior
btw?) and it works, but is slow because the original Rdds are not
cached and files must be read from disk.


I also discovered you can recover the InMemoryCached versions of the
Rdds using sqlContext.table("table1").

Thus you can do
sqlContext.table("table1").unionAll(sqlContext.table("table2")), but
this is like 10x slower than running the query on table1 which is
cached using sqlContext.cacheTable().  (at least on Spark 1.0.2,
haven't tried on 1.1.0 snapshot yet)

On Thu, Aug 21, 2014 at 12:17 AM, Michael Armbrust
<mich...@databricks.com> wrote:
> I believe this should work if you run srdd1.unionAll(srdd2).  Both RDDs must
> have the same schema.
>
>
> On Wed, Aug 20, 2014 at 11:30 PM, Evan Chan <velvia.git...@gmail.com> wrote:
>>
>> Is it possible to merge two cached Spark SQL tables into a single
>> table so it can queried with one SQL statement?
>>
>> ie, can you do schemaRdd1.union(schemaRdd2), then register the new
>> schemaRdd and run a query over it?
>>
>> Ideally, both schemaRdd1 and schemaRdd2 would be cached, so the union
>> should run cached too.
>>
>> thanks,
>> Evan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Merging two Spark SQL tables?

Reply via email to