Re: Spark Caching Kafka Metadata

2016-02-01 Thread Benjamin Han
Is there another way to create topics from Spark? Is there any reason the above code snippet would still produce this error? I've dumbly inserted waits and retries for testing, but that still doesn't consistently work, even after waiting several minutes. On Fri, Jan 29, 2016 at 8:29 AM, Cody

Re: Spark Caching Kafka Metadata

2016-01-29 Thread Cody Koeninger
The kafka direct stream doesn't do any explicit caching. I haven't looked through the underlying simple consumer code in the kafka project in detail, but I doubt it does either. Honestly, I'd recommend not using auto created topics (it makes it too easy to pollute your topics if someone

Spark Caching Kafka Metadata

2016-01-28 Thread asdf zxcv
Does Spark cache which kafka topics exist? A service incorrectly assumes all the relevant topics exist, even if they are empty, causing it to fail. Fortunately the service is automatically restarted and by default, kafka creates the topic after it is requested. I'm trying to create the topic if

Spark caching

2015-03-30 Thread Renato Marroquín Mogrovejo
Hi all, I am trying to understand Spark lazy evaluation works, and I need some help. I have noticed that creating an RDD once and using it many times won't trigger recomputation of it every time it gets used. Whereas creating a new RDD for every time a new operation is performed will trigger

Re: Spark caching

2015-03-30 Thread Sean Owen
I think that you get a sort of silent caching after shuffles, in some cases, since the shuffle files are not immediately removed and can be reused. (This is the flip side to the frequent question/complaint that the shuffle files aren't removed straight away.) On Mon, Mar 30, 2015 at 9:43 AM,

Re: Spark caching

2015-03-30 Thread Renato Marroquín Mogrovejo
Thanks Sean! Do you know if there is a way (even manually) to delete these intermediate shuffle results? I was just want to test the expected behaviour. I know that re-caching might be a positive action most of the times but I want to try it without it. Renato M. 2015-03-30 12:15 GMT+02:00 Sean

Re: Spark caching questions

2014-09-10 Thread Mayur Rustagi
Cached RDD do not survive SparkContext deletion (they are scoped on a per sparkcontext basis). I am not sure what you mean by disk based cache eviction, if you cache more RDD than disk space the result will not be very pretty :) Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com

Spark caching questions

2014-09-09 Thread Vladimir Rodionov
Hi, users 1. Disk based cache eviction policy? The same LRU? 2. What is the scope of a cached RDD? Does it survive application? What happen if I run Java app next time? Will RRD be created or read from cache? If , answer is YES, then ... 3. Is there are any way to invalidate cached RDD