Re: Yarn client mode: Setting environment variables

2016-02-18 Thread Lin Zhao
Thanks for the reply. I also found that sparkConf.setExecutorEnv works for yarn. From: Saisai Shao <sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>> Date: Wednesday, February 17, 2016 at 6:02 PM To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>> Cc

Yarn client mode: Setting environment variables

2016-02-17 Thread Lin Zhao
I've been trying to set some environment variables for the spark executors but haven't had much like. I tried editting conf/spark-env.sh but it doesn't get through to the executors. I'm running 1.6.0 and yarn, any pointer is appreciated. Thanks, Lin

Re: Spark streaming flow control and back pressure

2016-01-28 Thread Lin Zhao
tored ${storeCount.get()} messages to spark}") } From: Iulian DragoČ™ <iulian.dra...@typesafe.com<mailto:iulian.dra...@typesafe.com>> Date: Thursday, January 28, 2016 at 5:33 AM To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>> Cc: "user@spark.apache.or

Streaming: LeaseExpiredException when writing checkpoint

2016-01-28 Thread Lin Zhao
I'm seeing this error in the driver when running a streaming job. Not sure If it's critical. It happens maybe half of time checkpoint is saved. There are retries in the log but sometimes results in "Could not write checkpoint for time 145400632 ms to file

Spark streaming flow control and back pressure

2016-01-27 Thread Lin Zhao
I have an actor receiver that reads data and calls "store()" to save data to spark. I was hoping spark.streaming.receiver.maxRate and spark.streaming.backpressure would help me block the method when needed to avoid overflowing the pipeline. But it doesn't. My actor pumps millions of lines to

Re: Spark streaming flow control and back pressure

2016-01-27 Thread Lin Zhao
One solution is to read the scheduling delay and my actor can go to sleep if needed. Is this possible? From: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>> Date: Wednesday, January 27, 2016 at 5:28 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>&q

Streaming: mapWithState "Error during Java deserialization."

2016-01-26 Thread Lin Zhao
I'm using mapWithState, and hit https://issues.apache.org/jira/browse/SPARK-12591. While 1.6.1 is not released, I tried the workaround in the comment. But I had these error in one of the nodes. While millions of events go throught the mapWithState, only 7 show up in the log. Is this related to

Re: Spark Streaming: BatchDuration and Processing time

2016-01-22 Thread Lin Zhao
Hi Silvio, Can you go into a little detail how the back pressure work? Does it block the receiver? Or does it temporarily saves the incoming messages in mem/disk? I have a custom actor receiver that uses store() to save dataa to spark. Would the back pressure make store() call block? On 1/17/16,

Spark Streaming: Does mapWithState implicitly partition the dsteram?

2016-01-17 Thread Lin Zhao
When the state is passed to the task that handles a mapWithState for a particular key, if the key is distributed, it seems extremely difficult to coordinate and synchronise the state. Is there a partition by key before a mapWithState? If not what exactly is the execution model? Thanks, Lin

Spark Streaming: routing by key without groupByKey

2016-01-15 Thread Lin Zhao
I have requirement to route a paired DStream to a series of map and flatMap such that entries with the same key goes to the same thread within the same batch. Closest I can come up with is groupByKey().flatMap(_._2). But this kills throughput by 50%. When I think about it groupByKey is more

Re: Spark Streaming: custom actor receiver losing vast majority of data

2016-01-14 Thread Lin Zhao
anager: Removing RDD 41 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 40 16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 39 From: "Shixiong(Ryan) Zhu" <shixi...@databricks.com<mailto:shixi...@databricks.com>> Date: Thursday, January 14, 2016 at 4:13

Re: Spark Streaming: custom actor receiver losing vast majority of data

2016-01-14 Thread Lin Zhao
<mailto:shixi...@databricks.com>> Date: Thursday, January 14, 2016 at 4:41 PM To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>> Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Spark Streaming: custom actor receiver losing vast majority of data

Spark Streaming: custom actor receiver losing vast majority of data

2016-01-14 Thread Lin Zhao
Hi, I'm testing spark streaming with actor receiver. The actor keeps calling store() to save a pair to Spark. Once the job is launched, on the UI everything looks good. Millions of events gets through every batch. However, I added logging to the first step and found that only 20 or 40 events

yarn-client: SparkSubmitDriverBootstrapper not found in yarn client mode (1.6.0)

2016-01-13 Thread Lin Zhao
My job runs fine in yarn cluster mode but I have reason to use client mode instead. But I'm hitting this error when submitting: > spark-submit --class com.exabeam.martini.scripts.SparkStreamingTest --master > yarn --deploy-mode client --executor-memory 90G --num-executors 3 > --executor-cores

Spark streaming routing

2016-01-07 Thread Lin Zhao
I have a need to route the dstream through the streming pipeline by some key, such that data with the same key always goes through the same executor. There doesn't seem to be a way to do manual routing with Spark Streaming. The closest I can come up with is: stream.foreachRDD {rdd =>

Re: Spark streaming routing

2016-01-07 Thread Lin Zhao
cpu:memory ratio. From: Tathagata Das <t...@databricks.com<mailto:t...@databricks.com>> Date: Thursday, January 7, 2016 at 1:56 PM To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>> Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re:

"impossible to get artifacts " error when using sbt to build 1.6.0 for scala 2.11

2016-01-07 Thread Lin Zhao
I tried to build 1.6.0 for yarn and scala 2.11, but have an error. Any help is appreciated. [warn] Strategy 'first' was applied to 2 files [info] Assembly up to date: /Users/lin/git/spark/network/yarn/target/scala-2.11/spark-network-yarn-1.6.0-hadoop2.7.1.jar java.lang.IllegalStateException:

Re: "Failed to bind to" error with spark-shell on CDH5 and YARN

2015-10-25 Thread Lin Zhao
. From: Steve Loughran <ste...@hortonworks.com> Sent: Saturday, October 24, 2015 5:15 AM To: Lin Zhao Cc: user@spark.apache.org Subject: Re: "Failed to bind to" error with spark-shell on CDH5 and YARN On 24 Oct 2015, at 00:46, Lin Zhao <l...@exabeam.com<mailto:l...@ex