Can you elaborate a little bit more on the use case? It looks a little bit like 
an abuse of Spark in general . Interactive queries that are not suitable for 
in-memory batch processing might be better supported by ignite that has 
in-memory indexes, concept of hot, warm, cold data etc. or hive on tez+llap . 

> On 14 Dec 2015, at 17:19, Krishna Rao <krishnanj...@gmail.com> wrote:
> 
> Hi all,
> 
> What's the best way to run ad-hoc queries against a cached RDDs?
> 
> For example, say I have an RDD that has been processed, and persisted to 
> memory-only. I want to be able to run a count (actually 
> "countApproxDistinct") after filtering by an, at compile time, unknown 
> (specified by query) value.
> 
> I've experimented with using (abusing) Spark Streaming, by streaming queries 
> and running these against the cached RDD. However, as I say I don't think 
> that this is an intended use-case of Streaming.
> 
> Cheers,
> 
> Krishna

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to