Run ad-hoc queries at runtime against cached RDDs

Krishna Rao Mon, 14 Dec 2015 08:22:07 -0800

Hi all,

What's the best way to run ad-hoc queries against a cached RDDs?


For example, say I have an RDD that has been processed, and persisted to
memory-only. I want to be able to run a count (actually
"countApproxDistinct") after filtering by an, at compile time, unknown
(specified by query) value.

I've experimented with using (abusing) Spark Streaming, by streaming
queries and running these against the cached RDD. However, as I say I don't
think that this is an intended use-case of Streaming.

Cheers,

Krishna

Run ad-hoc queries at runtime against cached RDDs

Reply via email to