Can you elaborate a little bit more on the use case? It looks a little bit like an abuse of Spark in general . Interactive queries that are not suitable for in-memory batch processing might be better supported by ignite that has in-memory indexes, concept of hot, warm, cold data etc. or hive on tez+llap .
> On 14 Dec 2015, at 17:19, Krishna Rao <krishnanj...@gmail.com> wrote: > > Hi all, > > What's the best way to run ad-hoc queries against a cached RDDs? > > For example, say I have an RDD that has been processed, and persisted to > memory-only. I want to be able to run a count (actually > "countApproxDistinct") after filtering by an, at compile time, unknown > (specified by query) value. > > I've experimented with using (abusing) Spark Streaming, by streaming queries > and running these against the cached RDD. However, as I say I don't think > that this is an intended use-case of Streaming. > > Cheers, > > Krishna --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org