SO I tried the above (why doesn't union or ++ have the same behavior btw?) and it works, but is slow because the original Rdds are not cached and files must be read from disk.
I also discovered you can recover the InMemoryCached versions of the Rdds using sqlContext.table("table1"). Thus you can do sqlContext.table("table1").unionAll(sqlContext.table("table2")), but this is like 10x slower than running the query on table1 which is cached using sqlContext.cacheTable(). (at least on Spark 1.0.2, haven't tried on 1.1.0 snapshot yet) On Thu, Aug 21, 2014 at 12:17 AM, Michael Armbrust <mich...@databricks.com> wrote: > I believe this should work if you run srdd1.unionAll(srdd2). Both RDDs must > have the same schema. > > > On Wed, Aug 20, 2014 at 11:30 PM, Evan Chan <velvia.git...@gmail.com> wrote: >> >> Is it possible to merge two cached Spark SQL tables into a single >> table so it can queried with one SQL statement? >> >> ie, can you do schemaRdd1.union(schemaRdd2), then register the new >> schemaRdd and run a query over it? >> >> Ideally, both schemaRdd1 and schemaRdd2 would be cached, so the union >> should run cached too. >> >> thanks, >> Evan >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org