minute windows, we have issues with Could not
compute split, block —— not found. This is being run on a YARN cluster and
it seems like the executors are getting killed even though they should have
plenty of memory.
Also, it seems like no computation actually takes place until the end of
the
window
(there are a lot of repeated keys across this time frame, and
we want to combine them all -- we do this using reduceByKeyAndWindow).
But even when trying to do 5 minute windows, we have issues with Could not
compute split, block —— not found. This is being run on a YARN cluster and
it seems like
Whether I use 1 or 2 machines, the results are the same... Here follows the
results I got using 1 and 2 receivers with 2 machines:
2 machines, 1 receiver:
sbt/sbt run-main Benchmark 1 machine1 1000 21 | grep -i Total
delay\|record
15/04/13 16:41:34 INFO JobScheduler: Total delay: 0.156 s
Sorry, I was getting those errors because my workload was not sustainable.
However, I noticed that, by just running the spark-streaming-benchmark (
https://github.com/tdas/spark-streaming-benchmark/blob/master/Benchmark.scala
), I get no difference on the execution time, number of processed
Are you running # of receivers = # machines?
TD
On Thu, Apr 9, 2015 at 9:56 AM, Saiph Kappa saiph.ka...@gmail.com wrote:
Sorry, I was getting those errors because my workload was not sustainable.
However, I noticed that, by just running the spark-streaming-benchmark (
Hi,
I am just running this simple example with
machineA: 1 master + 1 worker
machineB: 1 worker
«
val ssc = new StreamingContext(sparkConf, Duration(1000))
val rawStreams = (1 to numStreams).map(_
=ssc.rawSocketStream[String](host, port,
StorageLevel.MEMORY_ONLY_SER)).toArray
val
If it is deterministically reproducible, could you generate full DEBUG
level logs, from the driver and the workers and give it to me? Basically I
want to trace through what is happening to the block that is not being
found.
And can you tell what Cluster manager are you using? Spark Standalone,
this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p16084.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11240.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
-not-compute-split-block-not-found-tp11186.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
We are using Sparks 1.0.
I'm using DStream operations such as map, filter and reduceByKeyAndWindow
and doing a foreach operation on DStream.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found
on RDD using raw kafka data?
Log File attached:
streaming.gz
http://apache-spark-user-list.1001560.n3.nabble.com/file/n11229/streaming.gz
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11229
.n3.nabble.com/file/n11229/streaming.gz
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11229.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Not at all. Don't have any such code.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11231.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11231.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186p11240.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
Are you by any change using only memory in the storage level of the input
streams?
TD
On Mon, Jun 30, 2014 at 5:53 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Bill,
let's say the processing time is t' and the window size t. Spark does not
*require* t' t. In fact, for *temporary* peaks in
Hi Tobias,
Your explanation makes a lot of sense. Actually, I tried to use partial
data on the same program yesterday. It has been up for around 24 hours and
is still running correctly. Thanks!
Bill
On Mon, Jun 30, 2014 at 5:53 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Bill,
let's say
Hi Tathagata,
Yes. The input stream is from Kafka and my program reads the data, keeps
all the data in memory, process the data, and generate the output.
Bill
On Mon, Jun 30, 2014 at 11:45 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:
Are you by any change using only memory in the
Tobias,
Your suggestion is very helpful. I will definitely investigate it.
Just curious. Suppose the batch size is t seconds. In practice, does Spark
always require the program to finish processing the data of t seconds
within t seconds' processing time? Can Spark begin to consume the new batch
Bill,
let's say the processing time is t' and the window size t. Spark does not
*require* t' t. In fact, for *temporary* peaks in your streaming data, I
think the way Spark handles it is very nice, in particular since 1) it does
not mix up the order in which items arrived in the stream, so items
Tobias,
Thanks for your help. I think in my case, the batch size is 1 minute.
However, it takes my program more than 1 minute to process 1 minute's data.
I am not sure whether it is because the unprocessed data pile up. Do you
have an suggestion on how to check it and solve it? Thanks!
Bill
On
Hi,
I am running a spark streaming job with 1 minute as the batch size. It ran
around 84 minutes and was killed because of the exception with the
following information:
*java.lang.Exception: Could not compute split, block input-0-1403893740400
not found*
Before it was killed, it was able to
24 matches
Mail list logo