How do I link JavaEsSpark.saveToEs() to a sparkConf?

2015-12-14 Thread Spark Enthusiast
Folks, I have the following program : SparkConf conf = new SparkConf().setMaster("local").setAppName("Indexer").set("spark.driver.maxResultSize", "2g");conf.set("es.index.auto.create", "true");conf.set("es.nodes", "localhost");conf.set("es.port", "9200");conf.set("es.write.operation",

Getting an error when trying to read a GZIPPED file

2015-09-02 Thread Spark Enthusiast
Folks, I have an input file which is gzipped. I use sc.textFile("foo.gz") when I see the following problem. Can someone help me how to fix this? 15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id15/09/03 10:05:32 INFO CodecPool: Got brand-new

Spark

2015-08-24 Thread Spark Enthusiast
I was running a Spark Job to crunch a 9GB apache log file When I saw the following error: 15/08/25 04:25:16 WARN scheduler.TaskSetManager: Lost task 99.0 in stage 37.0 (TID 4115, ip-10-150-137-100.ap-southeast-1.compute.internal): ExecutorLostFailure (executor 29 lost)15/08/25 04:25:16 INFO

How to parse multiple event types using Kafka

2015-08-23 Thread Spark Enthusiast
Folks, I use the following Streaming API from KafkaUtils : public JavaPairInputDStreamString, String inputDStream() { HashSetString topicsSet = new HashSetString(Arrays.asList(topics.split(,))); HashMapString, String kafkaParams = new HashMapString, String();

How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Spark Enthusiast
Folks, As I see, the Driver program is a single point of failure. Now, I have seen ways as to how to make it recover from failures on a restart (using Checkpointing) but I have not seen anything as to how to restart it automatically if it crashes. Will running the Driver as a Hadoop Yarn

Re: How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Spark Enthusiast
with cluster deploy mode only)   --supervise If given, restarts the driver on failure. At 2015-08-19 14:55:39, Spark Enthusiast sparkenthusi...@yahoo.in wrote: Folks, As I see, the Driver program is a single point of failure. Now, I have seen ways as to how to make it recover from

Re: Not seeing Log messages

2015-08-11 Thread Spark Enthusiast
Forgot to mention. Here is how I run the program :  ./bin/spark-submit --conf spark.app.master=local[1] ~/workspace/spark-python/ApacheLogWebServerAnalysis.py On Wednesday, 12 August 2015 10:28 AM, Spark Enthusiast sparkenthusi...@yahoo.in wrote: I wrote a small python program : def

Not seeing Log messages

2015-08-11 Thread Spark Enthusiast
I wrote a small python program : def parseLogs(self): Read and parse log file self._logger.debug(Parselogs() start) self.parsed_logs = (self._sc .textFile(self._logFile) .map(self._parseApacheLogLine) .cache())

How do I Process Streams that span multiple lines?

2015-08-03 Thread Spark Enthusiast
All  examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use? 

Can a Spark Driver Program be a REST Service by itself?

2015-07-01 Thread Spark Enthusiast
Folks, My Use case is as follows: My Driver program will be aggregating a bunch of Event Streams and acting on it. The Action on the aggregated events is configurable and can change dynamically. One way I can think of is to run the Spark Driver as a Service where a config push can be caught via

Can I do Joins across Event Streams ?

2015-07-01 Thread Spark Enthusiast
Hi, I have to build a system that reacts to a set of events. Each of these events are separate streams by themselves which are consumed from different Kafka Topics and hence will have different InputDStreams. Questions: Will I be able to do joins across multiple InputDStreams and collate the

Serialization Exception

2015-06-29 Thread Spark Enthusiast
For prototyping purposes, I created a test program injecting dependancies using Spring. Nothing fancy. This is just a re-write of KafkaDirectWordCount. When I run this, I get the following exception: Exception in thread main org.apache.spark.SparkException: Task not serializable     at

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast
a good idea.  So in terms of options…. spark streaming, storm, samza, akka and others…  Storm is probably the easiest to pick up,  spark streaming / akka may give you more flexibility and akka would work for CEP.  Just my $0.02 On Jun 16, 2015, at 9:40 PM, Spark Enthusiast sparkenthusi...@yahoo.in

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast
is stored to the database and this transformed data should then be sent to a new pipeline for further processing How can this be achieved using Spark? On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast sparkenthusi...@yahoo.in wrote: I have a use-case where a stream of Incoming events have

Re: Spark or Storm

2015-06-16 Thread Spark Enthusiast
I have a use-case where a stream of Incoming events have to be aggregated and joined to create Complex events. The aggregation will have to happen at an interval of 1 minute (or less). The pipeline is :                                  send events                                          enrich