.
TD
On Tue, Jun 3, 2014 at 10:09 AM, Michael Chang m...@tellapart.com wrote:
I only had the warning level logs, unfortunately. There were no other
references of 32855 (except a repeated stack trace, I believe). I'm using
Spark 0.9.1
On Mon, Jun 2, 2014 at 5:50 PM, Tathagata Das
wrote:
Hi Tathagata,
Thanks for your help! By not using coalesced RDD, do you mean not
repartitioning my Dstream?
Thanks,
Mike
On Tue, Jun 3, 2014 at 12:03 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
I think I know what is going on! This probably a race condition
Currently Spark Streaming does not support addition/deletion/modification
of DStream after the streaming context has been started.
Nor can you restart a stopped streaming context.
Also, multiple spark contexts (and therefore multiple streaming contexts)
cannot be run concurrently in the same JVM.
I am assuming that you are referring to the OneForOneStrategy: key not
found: 1401753992000 ms error, and not to the previous Time 1401753992000
ms is invalid Those two seem a little unrelated to me. Can you give
us the stacktrace associated with the key-not-found error?
TD
On Mon, Jun 2,
Do you have the info level logs of the application? Can you grep the
value 32855
to find any references to it? Also what version of the Spark are you using
(so that I can match the stack trace, does not seem to match with Spark
1.0)?
TD
On Mon, Jun 2, 2014 at 3:27 PM, Michael Chang
Can you give all the logs? Would like to see what is clearing the key
1401754908000
ms
TD
On Mon, Jun 2, 2014 at 5:38 PM, Vadim Chekan kot.bege...@gmail.com wrote:
Ok, it seems like Time ... is invalid is part of normal workflow, when
window DStream will ignore RDDs at moments in time when
How are you launching the application? sbt run ? spark-submit? local
mode or Spark standalone cluster? Are you packaging all your code into
a jar?
Looks to me that you seem to have spark classes in your execution
environment but missing some of Spark's dependencies.
TD
On Thu, May 22, 2014 at
The cleaner should remain up while the sparkcontext is still active (not
stopped). However, here it seems you are stopping the sparkContext
(ssc.stop(true)), the cleaner should be stopped. However, there was a bug
earlier where some of the cleaners may not have been stopped when the
context is
persists.
On Thu, May 22, 2014 at 3:59 PM, Shrikar archak shrika...@gmail.com wrote:
I am running as sbt run. I am running it locally .
Thanks,
Shrikar
On Thu, May 22, 2014 at 3:53 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
How are you launching the application? sbt run ? spark
Unfortunately, there is no API support for this right now. You could
implement it yourself by implementing your own receiver and controlling the
rate at which objects are received. If you are using any of the standard
receivers (Flume, Kafka, etc.), I recommended looking at the source code of
the
this in a generic that it can be used for all receivers ;)
TD
On Wed, May 21, 2014 at 12:57 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:
Unfortunately, there is no API support for this right now. You could
implement it yourself by implementing your own receiver and controlling the
rate
This do happens sometimes, but it is a warning because Spark is designed
try successive ports until it succeeds. So unless a cray number of
successive ports are blocked (runaway processes?? insufficient clearing of
ports by OS??), these errors should not be a problem for tests passing.
On
You could do
records.filter { _.isInstanceOf[Orange] } .map { _.asInstanceOf[Orange] }
On Wed, May 21, 2014 at 3:28 PM, Ian Holsman i...@holsman.com.au wrote:
Hi.
Firstly I'm a newb (to both Scala Spark).
I have a stream, that contains multiple types of records, and I would like
to
Are you running a vanilla Hadoop 2.3.0 or the one that comes with CDH5 /
HDP(?) ? We may be able to reproduce this in that case.
TD
On Wed, May 21, 2014 at 8:35 PM, Tom Graves tgraves...@yahoo.com wrote:
It sounds like something is closing the hdfs filesystem before everyone is
really done
Akka is under Apache 2 license too.
http://doc.akka.io/docs/akka/snapshot/project/licenses.html
On Tue, May 20, 2014 at 2:16 AM, YouPeng Yang yypvsxf19870...@gmail.comwrote:
Hi
Just know akka is under a commercial license,however Spark is under the
apache
license.
Is there any problem?
That's one the main motivation in using Tachyon ;)
http://tachyon-project.org/
It gives off heap in-memory caching. And starting Spark 0.9, you can cache
any RDD in Tachyon just by specifying the appropriate StorageLevel.
TD
On Mon, May 19, 2014 at 10:22 PM, Mohit Jaggi mohitja...@gmail.com
Doesnt DStream.foreach() suffice?
anyDStream.foreach { rdd =
// do something with rdd
}
On Wed, May 14, 2014 at 9:33 PM, Stephen Boesch java...@gmail.com wrote:
Looking further it appears the functionality I am seeking is in the
following *private[spark] * class ForEachdStream
(version
This gives dependency tree in SBT (spark uses this).
https://github.com/jrudolph/sbt-dependency-graph
TD
On Mon, May 12, 2014 at 4:55 PM, Sean Owen so...@cloudera.com wrote:
It sounds like you are doing everything right.
NoSuchMethodError suggests it's finding log4j, just not the right
A very crucial thing to remember when using file stream is that the files
must be written to the monitored directory atomically. That is when the
file system show the file in its listing, the file should not be appended /
updated after that. That often causes this kind of issues, as spark
Use DStream.foreachRDD to do an operation on the final RDD of every batch.
val sumandcount = numbers.map(n = (n.toDouble, 1)).reduce{ (a, b) = (a._1
+ b._1, a._2 + b._2) }
sumandcount.foreachRDD { rdd = val first: (Double, Int) = rdd.take(1) ;
... }
DStream.reduce creates DStream whose RDDs
Since you are using the latest Spark code and not Spark 0.9.1 (guessed from
the log messages), you can actually do graceful shutdown of a streaming
context. This ensures that the receivers are properly stopped and all
received data is processed and then the system terminates (stop() stays
blocked
Doesnt the run-example script work for you? Also, are you on the latest
commit of branch-1.0 ?
TD
On Mon, May 5, 2014 at 7:51 PM, Soumya Simanta soumya.sima...@gmail.comwrote:
Yes, I'm struggling with a similar problem where my class are not found on
the worker nodes. I'm using
Okay, this needs to be fixed. Thanks for reporting this!
On Mon, May 5, 2014 at 11:00 PM, wxhsdp wxh...@gmail.com wrote:
Hi, TD
i tried on v1.0.0-rc3 and still got the error
--
View this message in context:
One main reason why Spark Streaming can achieve higher throughput than
Storm is because Spark Streaming operates in coarser-grained batches -
second-scale massive batches - which reduce per-tuple of overheads in
shuffles, and other kinds of data movements, etc.
Note that, this is also true that
A few high-level suggestions.
1. I recommend using the new Receiver API in almost-released Spark 1.0 (see
branch-1.0 / master branch on github). Its a slightly better version of the
earlier NetworkReceiver, as it hides away blockgenerator (which needed to
be unnecessarily manually started and
with
the version spark uses. Sorry for spamming.
Weide
On Sat, May 3, 2014 at 7:04 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
I am a little confused about the version of Spark you are using. Are you
using Spark 0.9.1 that uses scala 2.10.3 ?
TD
On Sat, May 3, 2014 at 6:16 PM, Weide
Can you tell which version of Spark you are using? Spark 1.0 RC3, or
something intermediate?
And do you call sparkContext.stop at the end of your application? If so,
does this error occur before or after the stop()?
TD
On Sun, May 4, 2014 at 2:40 AM, wxhsdp wxh...@gmail.com wrote:
Hi, all
i
Could be a bug. Can you share a code with data that I can use to reproduce
this?
TD
On May 2, 2014 9:49 AM, Adrian Mocanu amoc...@verticalscope.com wrote:
Has anyone else noticed that *sometimes* the same tuple calls update
state function twice?
I have 2 tuples with the same key in 1 RDD
Take a look at the RDD.pipe() operation. That allows you to pipe the data
in a RDD to any external shell command (just like Unix Shell pipe).
On May 1, 2014 10:46 AM, Mohit Singh mohit1...@gmail.com wrote:
Hi,
I guess Spark is using streaming in context of streaming live data but
what I mean
Depends on your code. Referring to the earlier example, if you do
words.map(x = (x,1)).updateStateByKey()
then for a particular word, if a batch contains 6 occurrences of that word,
then the Seq[V] will be [1, 1, 1, 1, 1, 1]
Instead if you do
words.map(x = (x,1)).reduceByKey(_ +
Ordered by what? arrival order? sort order?
TD
On Thu, May 1, 2014 at 2:35 PM, Adrian Mocanu amoc...@verticalscope.comwrote:
If I use a range partitioner, will this make updateStateByKey take the
tuples in order?
Right now I see them not being taken in order (most of them are ordered
Yeah, I remember changing fold to sum in a few places, probably in
testsuites, but missed this example I guess.
On Wed, Apr 30, 2014 at 1:29 PM, Sean Owen so...@cloudera.com wrote:
S is the previous count, if any. Seq[V] are potentially many new
counts. All of them have to be added together
You may have already seen it, but I will mention it anyways. This example
may help.
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala
Here the state is essentially a running count of the words seen. So the
value
are not using stream context with master local, we have 1 Master and 8
Workers and 1 word source. The command line that we are using is:
bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount
spark://192.168.0.13:7077
On Apr 30, 2014, at 0:09, Tathagata Das tathagata.das1
You can get the internal AvroFlumeEvent inside the SparkFlumeEvent using
SparkFlumeEvent.event. That should probably give you all the original text
data.
On Mon, Apr 28, 2014 at 5:46 AM, Kulkarni, Vikram vikram.kulka...@hp.comwrote:
Hi Spark-users,
Within my Spark Streaming program, I am
Hello Neha,
This is the result of a known bug in 0.9. Can you try running the latest
Spark master branch to see if this problem is resolved?
TD
On Tue, Apr 22, 2014 at 2:48 AM, NehaS Singh nehas.si...@lntinfotech.comwrote:
Hi,
I have installed
help me in how to use some external data to manipulate the RDD
records ?
Thanks and regards
Gagan B Mishra
*Programmer*
*560034, Bangalore*
*India*
On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User
List] [hidden email]
http://user/SendEmail.jtp?type=nodenode=4434i
As long as the socket server sends data through the same connection, the
existing code is going to work. The socket.getInputStream returns a input
stream which will continuously allow you to pull data sent over the
connection. The bytesToObject function continuously reads data from the
input
Diana, that is a good question.
When you persist an RDD, the system still remembers the whole lineage of
parent RDDs that created that RDD. If one of the executor fails, and the
persist data is lost (both local disk and memory data will get lost), then
the lineage is used to recreate the RDD. The
Does this happen at low event rate for that topic as well, or only for a
high volume rate?
TD
On Wed, Apr 9, 2014 at 11:24 PM, gaganbm gagan.mis...@gmail.com wrote:
I am really at my wits' end here.
I have different Streaming contexts, lets say 2, and both listening to same
Kafka topics. I
, Matei Zaharia, Nan Zhu, Nick Lanham,
Patrick Wendell, Prabin Banka, Prashant Sharma, Qiuzhuang,
Raymond Liu, Reynold Xin, Sandy Ryza, Sean Owen, Shixiong Zhu,
shiyun.wxm, Stevo Slavić, Tathagata Das, Tom Graves, Xiangrui Meng
TD
A small additional note: Please use the direct download links in the Spark
Downloads http://spark.apache.org/downloads.html page. The Apache mirrors
take a day or so to sync from the main repo, so may not work immediately.
TD
On Wed, Apr 9, 2014 at 2:54 PM, Tathagata Das
tathagata.das1
-examples_2.10-assembly-0.9.0-incubating.jar
System Classpath
http://10.10.41.67:40368/jars/spark-examples_2.10-assembly-0.9.0-incubating.jarAdded
By User
From: Tathagata Das [mailto:tathagata.das1...@gmail.com]
Sent: Monday, April 07, 2014 7:54 PM
To: user@spark.apache.org
Few things that would be helpful.
1. Environment settings - you can find them on the environment tab in the
Spark application UI
2. Are you setting the HDFS configuration correctly in your Spark program?
For example, can you write a HDFS file from a Spark program (say
spark-shell) to your HDFS
27, 2014 at 3:47 PM, Evgeny Shishkin itparan...@gmail.comwrote:
On 28 Mar 2014, at 01:44, Tathagata Das tathagata.das1...@gmail.com
wrote:
The more I think about it the problem is not about /tmp, its more about
the workers not having enough memory. Blocks of received data could be
falling
Seems like the configuration of the Spark worker is not right. Either the
worker has not been given enough memory or the allocation of the memory to
the RDD storage needs to be fixed. If configured correctly, the Spark
workers should not get OOMs.
On Thu, Mar 27, 2014 at 2:52 PM, Evgeny
When you say launch long-running tasks does it mean long running Spark
jobs/tasks, or long-running tasks in another system?
If the rate of requests from Kafka is not low (in terms of records per
second), you could collect the records in the driver, and maintain the
shared bag in the driver. A
*Answer 1:*Make sure you are using master as local[n] with n 1
(assuming you are running it in local mode). The way Spark Streaming
works is that it assigns a code to the data receiver, and so if you
run the program with only one core (i.e., with local or local[1]),
then it wont have resources to
Can you give us the more detailed exception + stack trace in the log? It
should be in the driver log. If not, please take a look at the executor
logs, through the web ui to find the stack trace.
TD
On Tue, Mar 25, 2014 at 10:43 PM, gaganbm gagan.mis...@gmail.com wrote:
Hi Folks,
Is this
correct?
Thanks,
Sourav
On Tue, Mar 25, 2014 at 3:26 AM, Tathagata Das
tathagata.das1...@gmail.com wrote:
You can use RollingFileAppenders in log4j.properties.
http://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/RollingFileAppender.html
You can have other scripts
Unfortunately there isnt one right now. But it is probably too hard to
start with the
JavaNetworkWordCounthttps://github.com/apache/incubator-spark/blob/master/examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java,
and use the ZeroMQUtils in the same way as the
to count how many
windows there are.
-A
-Original Message-
From: Tathagata Das [mailto:tathagata.das1...@gmail.com]
Sent: March-24-14 6:55 PM
To: user@spark.apache.org
Cc: u...@spark.incubator.apache.org
Subject: Re: [bug?] streaming window unexpected behaviour
Yes, I believe
Hello Sanjay,
Yes, your understanding of lazy semantics is correct. But ideally
every batch should read based on the batch interval provided in the
StreamingContext. Can you open a JIRA on this?
On Mon, Mar 24, 2014 at 7:45 AM, Sanjay Awatramani
sanjay_a...@yahoo.com wrote:
Hi All,
I found
Yes, I believe that is current behavior. Essentially, the first few
RDDs will be partial windows (assuming window duration sliding
interval).
TD
On Mon, Mar 24, 2014 at 1:12 PM, Adrian Mocanu
amoc...@verticalscope.com wrote:
I have what I would call unexpected behaviour when using window on a
I am not sure how to debug this without any more information about the
source. Can you monitor on the receiver side that data is being accepted by
the receiver but not reported?
TD
On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia e.costaalf...@unibs.itwrote:
Hi TD,
I have seen in the web UI
Are you launching your application using scala or java command? scala
command bring in a version of Akka that we have found to cause conflicts
with Spark's version for Akka. So its best to launch using Java.
TD
On Thu, Mar 6, 2014 at 3:45 PM, Deepak Nulu deepakn...@gmail.com wrote:
I see the
I dont have a Eclipse setup so I am not sure what is going on here. I would
try to use maven in the command line with a pom to see if this compiles.
Also, try to cleanup your system maven cache. Who knows if it had pulled in
a wrong version of kafka 0.8 and using it all the time. Blowing away the
Yes! Spark streaming programs are just like any spark program and so any
ec2 cluster setup using the spark-ec2 scripts can be used to run spark
streaming programs as well.
On Thu, Feb 27, 2014 at 10:11 AM, Aureliano Buendia buendia...@gmail.comwrote:
Hi,
Does the ec2 support for spark 0.9
means
for daemonizing and monitoring spark streaming apps which are supposed to
run 24/7? If not, any suggestions for how to do this?
On Thu, Feb 27, 2014 at 8:23 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
Zookeeper is automatically set up in the cluster as Spark uses Zookeeper
801 - 859 of 859 matches
Mail list logo