Actually |SchemaRDD.cache()| behaves exactly the same as |cacheTable|
since Spark 1.2.0. The reason why your web UI didn’t show you the cached
table is that both |cacheTable| and |sql("SELECT ...")| are lazy :-)
Simply add a |.collect()| after the |sql(...)| call.
Cheng
On 2/2/15 12:23 PM, ankits wrote:
Thanks for your response. So AFAICT
calling parallelize(1 to1024).map(i =>KV(i,
i.toString)).toSchemaRDD.cache().count(), will allow me to see the size of
the schemardd in memory
and parallelize(1 to1024).map(i =>KV(i, i.toString)).cache().count() will
show me the size of a regular rdd.
But this will not show us the size when using cacheTable() right? Like if i
call
parallelize(1 to1024).map(i =>KV(i,
i.toString)).toSchemaRDD.registerTempTable("test")
sqc.cacheTable("test")
sqc.sql("SELECT COUNT(*) FROM test")
the web UI does not show us the size of the cached table.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Get-size-of-rdd-in-memory-tp10366p10388.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org