When using manual Kafka offset commit in Spark streaming job and application
fails to process current batch without committing offset in executor, is it
expected behavior that next batch will be processed and offset will be moved to
next batch regardless of application failure to commit? It
ed to tasks *(T1, T2, T3, T4, T5).* Will
>*ubatch2* of partitions *(P1',P2',P3',P4',P5')* at time *T5* be also
>assigned to the same set of tasks *(T1, T2, T3, T4, T5)* or will new
>tasks *(T6, T7, T8, T9, T10)* be created for *ubatch2*?
>
>
> I have put up this question on SO @
> https://stackoverflow.com/questions/56102094/kafka-spark-streaming-integration-relation-between-tasks-and-dstreams
> .
>
> Regards
> Sheel
>
his question on SO @
https://stackoverflow.com/questions/56102094/kafka-spark-streaming-integration-relation-between-tasks-and-dstreams
.
Regards
Sheel
Hello there,
I just upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload
and got the error (java.lang.AbstractMethodError) never seen before; check
the error stack attached in (a) bellow.
anyone knows if spark 2.3.1 works well with kafka
spark-streaming-kafka-0-10?
this link
> and compact data format if CSV isn't required.
>
> --
> *From:* Aakash Basu <aakash.spark@gmail.com>
> *Sent:* Friday, March 16, 2018 9:12:39 AM
> *To:* sagar grover
> *Cc:* Bowden, Chris; Tathagata Das; Dylan Guedes; Georg Heiler; user;
> jagrati.go...@myntra.com
;>>>
>>>> Cool! Shall try it and revert back tomm.
>>>>
>>>> Thanks a ton!
>>>>
>>>> On 15-Mar-2018 11:50 PM, "Bowden, Chris" <chris.bow...@microfocus.com>
>>>> wrote:
>>>>
>>>>>
nd
>>>> deserialization is often an orthogonal and implicit transform. However, in
>>>> Spark, serialization and deserialization is an explicit transform (e.g.,
>>>> you define it in your query plan).
>>>>
>>>>
>>>> To make this mo
t;> offers from_csv out of the box as an expression (although CSV is well
>>> supported as a data source). You could implement an expression by reusing a
>>> lot of the supporting CSV classes which may result in a better user
>>> experience vs. explicitly using split
may result in a better user
>> experience vs. explicitly using split and array indices, etc. In this
>> simple example, casting the binary to a string just works because there is
>> a common understanding of string's encoded as bytes between Spark and Kafka
>> by default.
>&g
---
> *From:* Aakash Basu <aakash.spark@gmail.com>
> *Sent:* Thursday, March 15, 2018 10:48:45 AM
> *To:* Bowden, Chris
> *Cc:* Tathagata Das; Dylan Guedes; Georg Heiler; user
> *Subject:* Re: Multiple Kafka Spark Streaming Dataframe Join query
>
> Hey Chris,
>
&g
com>
Sent: Thursday, March 15, 2018 7:52:28 AM
To: Tathagata Das
Cc: Dylan Guedes; Georg Heiler; user
Subject: Re: Multiple Kafka Spark Streaming Dataframe Join query
Hi,
And if I run this below piece of code -
from pyspark.sql import SparkSession
import time
class test:
spark
; From: Aakash Basu <aakash.spark@gmail.com>
> Sent: Thursday, March 15, 2018 7:52:28 AM
> To: Tathagata Das
> Cc: Dylan Guedes; Georg Heiler; user
> Subject: Re: Multiple Kafka Spark Streaming Dataframe Join query
>
> Hi,
>
> And if I run this below piece of code -
>
>
Hi,
And if I run this below piece of code -
from pyspark.sql import SparkSession
import time
class test:
spark = SparkSession.builder \
.appName("DirectKafka_Spark_Stream_Stream_Join") \
.getOrCreate()
# ssc = StreamingContext(spark, 20)
table1_stream =
Any help on the above?
On Thu, Mar 15, 2018 at 3:53 PM, Aakash Basu
wrote:
> Hi,
>
> I progressed a bit in the above mentioned topic -
>
> 1) I am feeding a CSV file into the Kafka topic.
> 2) Feeding the Kafka topic as readStream as TD's article suggests.
> 3) Then,
Hi,
I progressed a bit in the above mentioned topic -
1) I am feeding a CSV file into the Kafka topic.
2) Feeding the Kafka topic as readStream as TD's article suggests.
3) Then, simply trying to do a show on the streaming dataframe, using
queryName('XYZ') in the writeStream and writing a sql
Thanks to TD, the savior!
Shall look into it.
On Thu, Mar 15, 2018 at 1:04 AM, Tathagata Das
wrote:
> Relevant: https://databricks.com/blog/2018/03/13/
> introducing-stream-stream-joins-in-apache-spark-2-3.html
>
> This is true stream-stream join which will
Relevant:
https://databricks.com/blog/2018/03/13/introducing-stream-stream-joins-in-apache-spark-2-3.html
This is true stream-stream join which will automatically buffer delayed
data and appropriately join stuff with SQL join semantics. Please check it
out :)
TD
On Wed, Mar 14, 2018 at 12:07
I misread it, and thought that you question was if pyspark supports kafka
lol. Sorry!
On Wed, Mar 14, 2018 at 3:58 PM, Aakash Basu
wrote:
> Hey Dylan,
>
> Great!
>
> Can you revert back to my initial and also the latest mail?
>
> Thanks,
> Aakash.
>
> On 15-Mar-2018
Hey Dylan,
Great!
Can you revert back to my initial and also the latest mail?
Thanks,
Aakash.
On 15-Mar-2018 12:27 AM, "Dylan Guedes" wrote:
> Hi,
>
> I've been using the Kafka with pyspark since 2.1.
>
> On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu
Hi,
I've been using the Kafka with pyspark since 2.1.
On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu
wrote:
> Hi,
>
> I'm yet to.
>
> Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package
> allows Python? I read somewhere, as of now Scala and Java are
Hi,
I'm yet to.
Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package allows
Python? I read somewhere, as of now Scala and Java are the languages to be
used.
Please correct me if am wrong.
Thanks,
Aakash.
On 14-Mar-2018 8:24 PM, "Georg Heiler" wrote:
Did you try spark 2.3 with structured streaming? There watermarking and
plain sql might be really interesting for you.
Aakash Basu schrieb am Mi. 14. März 2018 um
14:57:
> Hi,
>
>
>
> *Info (Using):Spark Streaming Kafka 0.8 package*
>
> *Spark 2.2.1*
> *Kafka 1.0.1*
>
Hi,
*Info (Using):Spark Streaming Kafka 0.8 package*
*Spark 2.2.1*
*Kafka 1.0.1*
As of now, I am feeding paragraphs in Kafka console producer and my Spark,
which is acting as a receiver is printing the flattened words, which is a
complete RDD operation.
*My motive is to read two tables
I need some clarification for Kafka consumers in Spark or otherwise. I have
the following Kafka Consumer. The consumer is reading from a topic, and I
have a mechanism which blocks the consumer from time to time.
The producer is a separate thread which is continuously sending data. I
want to
other and that way
> you'll have less final events.
>
> -Original Message-
> From: Vikash Pareek [mailto:vikash.par...@infoobjects.com]
> Sent: Tuesday, May 30, 2017 4:00 PM
> To: user@spark.apache.org
> Subject: Message getting lost in Kafka + Spark Streaming
>
.
-Original Message-
From: Vikash Pareek [mailto:vikash.par...@infoobjects.com]
Sent: Tuesday, May 30, 2017 4:00 PM
To: user@spark.apache.org
Subject: Message getting lost in Kafka + Spark Streaming
I am facing an issue related to spark streaming with kafka, my use case is as
follow:
1. Spark
val keyedMessage = new KeyedMessage[String,
> String](props.getProperty("outTopicHarmonized"),
> null, row.toString())
> producer.send(keyedMessage)
>
> }
> //hack, should be done with the flush
>
ucer.send(keyedMessage)
}
//hack, should be done with the flush
Thread.sleep(1000)
producer.close()
}
*
We explicitely added sleep(1000) for testing purpose.
But this is also not solving the problem :(
Any suggestion would be appreciat
This doesn't sound like a question regarding Kafka streaming, it
sounds like confusion about the scope of variables in spark generally.
Is that right? If so, I'd suggest reading the documentation, starting
with a simple rdd (e.g. using sparkContext.parallelize), and
experimenting to confirm your
I am trying to stream the data from Kafka to Spark.
JavaPairInputDStream directKafkaStream =
KafkaUtils.createDirectStream(ssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
How are you producing data? I just tested your code and i can receive the
messages from Kafka.
Regards
Sumit Chawla
On Sun, Sep 18, 2016 at 7:56 PM, Sateesh Karuturi <
sateesh.karutu...@gmail.com> wrote:
> i am very new to *Spark streaming* and i am implementing small exercise
> like sending
Empty RDD generally means Kafka is not producing msgs in those intervals.
For example, if I have batch duration of 10secs and there is no msgs within
any 10 secs, RDD corresponding to that 10 secs will be empty.
On Mon, Sep 19, 2016 at 12:56 PM, Sateesh Karuturi <
sateesh.karutu...@gmail.com>
i am very new to *Spark streaming* and i am implementing small exercise
like sending *XML* data from *kafka* and need to receive that *streaming* data
through *spark streaming.* I tried in all possible ways.. but every time i
am getting *empty values.*
*There is no problem in Kafka side, only
Unless I'm misreading the image you posted, it does show event counts
for the single batch that is still running, with 1.7 billion events in
it. The recent batches do show 0 events, but I'm guessing that's
because they're actually empty.
When you said you had a kafka topic with 1.7 billion
Streaming UI tab showing empty events and very different metrics than on 1.5.2
On Thu, Jun 23, 2016 at 5:06 AM, Colin Kincaid Williams wrote:
> After a bit of effort I moved from a Spark cluster running 1.5.2, to a
> Yarn cluster running 1.6.1 jars. I'm still setting the maxRPP.
After a bit of effort I moved from a Spark cluster running 1.5.2, to a
Yarn cluster running 1.6.1 jars. I'm still setting the maxRPP. The
completed batches are no longer showing the number of events processed
in the Streaming UI tab . I'm getting around 4k inserts per second in
hbase, but I
Thanks @Cody, I will try that out. In the interm, I tried to validate
my Hbase cluster by running a random write test and see 30-40K writes
per second. This suggests there is noticeable room for improvement.
On Tue, Jun 21, 2016 at 8:32 PM, Cody Koeninger wrote:
> Take HBase
Take HBase out of the equation and just measure what your read
performance is by doing something like
createDirectStream(...).foreach(_.println)
not take() or print()
On Tue, Jun 21, 2016 at 3:19 PM, Colin Kincaid Williams wrote:
> @Cody I was able to bring my processing time
@Cody I was able to bring my processing time down to a second by
setting maxRatePerPartition as discussed. My bad that I didn't
recognize it as the cause of my scheduling delay.
Since then I've tried experimenting with a larger Spark Context
duration. I've been trying to get some noticeable
I'll try dropping the maxRatePerPartition=400, or maybe even lower.
However even at application starts up I have this large scheduling
delay. I will report my progress later on.
On Mon, Jun 20, 2016 at 2:12 PM, Cody Koeninger wrote:
> If your batch time is 1 second and your
If your batch time is 1 second and your average processing time is
1.16 seconds, you're always going to be falling behind. That would
explain why you've built up an hour of scheduling delay after eight
hours of running.
On Sat, Jun 18, 2016 at 4:40 PM, Colin Kincaid Williams
Hi Mich again,
Regarding batch window, etc. I have provided the sources, but I'm not
currently calling the window function. Did you see the program source?
It's only 100 lines.
https://gist.github.com/drocsid/b0efa4ff6ff4a7c3c8bb56767d0b6877
Then I would expect I'm using defaults, other than
Ok
What is the set up for these please?
batch window
window length
sliding interval
And also in each batch window how much data do you get in (no of messages
in the topic whatever)?
Dr Mich Talebzadeh
LinkedIn *
I believe you have an issue with performance?
have you checked spark GUI (default 4040) for details including shuffles
etc?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
I'm attaching a picture from the streaming UI.
On Sat, Jun 18, 2016 at 7:59 PM, Colin Kincaid Williams wrote:
> There are 25 nodes in the spark cluster.
>
> On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh
> wrote:
>> how many nodes are in your
There are 25 nodes in the spark cluster.
On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh
wrote:
> how many nodes are in your cluster?
>
> --num-executors 6 \
> --driver-memory 4G \
> --executor-memory 2G \
> --total-executor-cores 12 \
>
>
> Dr Mich Talebzadeh
>
>
how many nodes are in your cluster?
--num-executors 6 \
--driver-memory 4G \
--executor-memory 2G \
--total-executor-cores 12 \
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
I updated my app to Spark 1.5.2 streaming so that it consumes from
Kafka using the direct api and inserts content into an hbase cluster,
as described in this thread. I was away from this project for awhile
due to events in my family.
Currently my scheduling delay is high, but the processing time
I also tried
jsc.sparkContext().sc().hadoopConfiguration().set("dfs.replication", "2")
But, still its not working.
Any ideas why its not working ?
Abhi
On Tue, May 31, 2016 at 4:03 PM, Abhishek Anand
wrote:
> My spark streaming checkpoint directory is being
My spark streaming checkpoint directory is being written to HDFS with
default replication factor of 3.
In my streaming application where I am listening from kafka and setting the
dfs.replication = 2 as below the files are still being written with
replication factor=3
SparkConf sparkConfig = new
Thanks Cody, I can see that the partitions are well distributed...
Then I'm in the process of using the direct api.
On Tue, May 3, 2016 at 6:51 PM, Cody Koeninger wrote:
> 60 partitions in and of itself shouldn't be a big performance issue
> (as long as producers are
60 partitions in and of itself shouldn't be a big performance issue
(as long as producers are distributing across partitions evenly).
On Tue, May 3, 2016 at 1:44 PM, Colin Kincaid Williams wrote:
> Thanks again Cody. Regarding the details 66 kafka partitions on 3
> kafka servers,
Thanks again Cody. Regarding the details 66 kafka partitions on 3
kafka servers, likely 8 core systems with 10 disks each. Maybe the
issue with the receiver was the large number of partitions. I had
miscounted the disks and so 11*3*2 is how I decided to partition my
topic on insertion, ( by my
print() isn't really the best way to benchmark things, since it calls
take(10) under the covers, but 380 records / second for a single
receiver doesn't sound right in any case.
Am I understanding correctly that you're trying to process a large
number of already-existing kafka messages, not keep
Hello again. I searched for "backport kafka" in the list archives but
couldn't find anything but a post from Spark 0.7.2 . I was going to
use accumulators to make a counter, but then saw on the Streaming tab
the Receiver Statistics. Then I removed all other "functionality"
except:
Hi Cody,
I'm going to use an accumulator right now to get an idea of the
throughput. Thanks for mentioning the back ported module. Also it
looks like I missed this section:
https://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#reducing-the-processing-time-of-each-batch
from the
016 10:55 AM
> To: user@spark.apache.org
> Subject: Improving performance of a kafka spark streaming app
>
> I've written an application to get content from a kafka topic with 1.7 billion
> entries, get the protobuf serialized entries, and insert into hbase.
> Currently the enviro
Have you tested for read throughput (without writing to hbase, just
deserialize)?
Are you limited to using spark 1.2, or is upgrading possible? The
kafka direct stream is available starting with 1.3. If you're stuck
on 1.2, I believe there have been some attempts to backport it, search
the
I've written an application to get content from a kafka topic with 1.7
billion entries, get the protobuf serialized entries, and insert into
hbase. Currently the environment that I'm running in is Spark 1.2.
With 8 executors and 2 cores, and 2 jobs, I'm only getting between
0-2500 writes /
There's 1 topic per partition, so you're probably better off dealing
with topics that way rather than at the individual message level.
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
Look at the discussion of "HasOffsetRanges"
If you
Hi,
I'm just trying to process the data that come from the kafka source in my
spark streaming application. What I want to do is get the pair of topic and
message in a tuple from the message stream.
Here is my streams:
val streams = KafkaUtils.createDirectStream[String, Array[Byte],
>
Hi,
I'm just trying to process the data that come from the kafka source in my
spark streaming application. What I want to do is get the pair of topic and
message in a tuple from the message stream.
Here is my streams:
val streams = KafkaUtils.createDirectStream[String, Array[Byte],
>
Koeninger [mailto:c...@koeninger.org]
Sent: Monday, March 14, 2016 9:39 PM
To: Mukul Gupta <mukul.gu...@aricent.com>
Cc: user@spark.apache.org
Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel
So what's happening here is that print() uses take(). Take() will try to
s://github.com/guptamukul/sparktest.git
>
>
> From: Cody Koeninger <c...@koeninger.org>
> Sent: 11 March 2016 23:04
> To: Mukul Gupta
> Cc: user@spark.apache.org
> Subject: Re: Kafka + Spark streaming, RDD partitions not processed in
.
Following is the link to repository:
https://github.com/guptamukul/sparktest.git
From: Cody Koeninger <c...@koeninger.org>
Sent: 11 March 2016 23:04
To: Mukul Gupta
Cc: user@spark.apache.org
Subject: Re: Kafka + Spark streaming, RDD partitions not pro
the test after
>> increasing the partitions of kafka topic to 5. This time also RDD partition
>> corresponding to partition 1 of kafka was processed on one of the spark
>> executor. Once processing is finished for this RDD partition, then RDD
>> partitions corresponding to
ssed.print(90);
try {
jssc.start();
jssc.awaitTermination();
} catch (Exception e) {
} finally {
jssc.close();
}
}
}
From: Cody Koeninger <c...@koeninger.org>
Sent: 11 March 2016 20:42
To: Mukul Gupta
Cc: user@spark.apache.org
Subject: Re: Kafka + Spark streaming, RDD partition
a partitions were processed
> in parallel on different spark executors. I am not clear about why spark is
> waiting for operations on first RDD partition to finish, while it could
> process remaining partitions in parallel? Am I missing any configuration?
> Any help is appreciated. Thanks
Am I
missing any configuration? Any help is appreciated.Thanks,Mukul
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-streaming-RDD-partitions-not-processed-in-parallel-tp26457.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Yes, the partition IDs are the same.
As far as the failure / subclassing goes, you may want to keep an eye on
https://issues.apache.org/jira/browse/SPARK-10320 , not sure if the
suggestions in there will end up going anywhere.
On Fri, Sep 25, 2015 at 3:01 PM, Neelesh wrote:
Thanks. Ill keep an eye on this. Our implementation of the DStream
basically accepts a function to compute current offsets. The implementation
of the function fetches list of topics from zookeeper once in while. It
then adds consumer offsets for newly added topics with the currentOffsets
thats in
Thanks Petr, Cody. This is a reasonable place to start for me. What I'm
trying to achieve
stream.foreachRDD {rdd=>
rdd.foreachPartition { p=>
Try(myFunc(...)) match {
case Sucess(s) => updatewatermark for this partition //of course,
expectation is that it will work only if
For the 1-1 mapping case, can I use TaskContext.get().partitionId as an
index in to the offset ranges?
For the failure case, yes, I'm subclassing of DirectKafkaInputDStream. As
for failures, different partitions in the same batch may be talking to
different RDBMS servers due to multitenancy - a
Your success case will work fine, it is a 1-1 mapping as you said.
To handle failures in exactly the way you describe, you'd need to subclass
or modify DirectKafkaInputDStream and change the way compute() works.
Unless you really are going to have very fine-grained failures (why would
only a
Hi,
We are using DirectKafkaInputDStream and store completed consumer
offsets in Kafka (0.8.2). However, some of our use case require that
offsets be not written if processing of a partition fails with certain
exceptions. This allows us to build various backoff strategies for that
partition,
You can have offsetRanges on workers f.e.
object Something {
var offsetRanges = Array[OffsetRange]()
def create[F : ClassTag](stream: InputDStream[Array[Byte]])
(implicit codec: Codec[F]: DStream[F] = {
stream transform { rdd =>
offsetRanges =
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
also has an example of how to close over the offset ranges so they are
available on executors.
On Fri, Sep 25, 2015 at 12:50 PM, Neelesh wrote:
> Hi,
>We are
er
ssc.start()
ssc.awaitTermination()
}
}
Thanks
Sri
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-see-my-kafka-spark-streaming-output-tp24750p24751.html
Sent from the Apache Spark User List mailing list archive at Nabble.
awaitTermination()
}
}
Thanks
Sri
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-see-my-kafka-spark-streaming-output-tp24750.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
, Bartek Radziszewski bar...@scalaric.com
wrote:
Hey,
I’m trying to run kafka spark streaming using mesos with following example:
*sc.stop*
*import org.apache.spark.SparkConf*
*import org.apache.spark.SparkContext._*
*import kafka.serializer.StringDecoder*
*import org.apache.spark.streaming
Hey,
I’m trying to run kafka spark streaming using mesos with following example:
sc.stop
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
import kafka.serializer.StringDecoder
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import
.setMaster(local) set it to local[2] or local[*]
Thanks
Best Regards
On Thu, Jun 18, 2015 at 5:59 PM, Bartek Radziszewski bar...@scalaric.com
wrote:
hi,
I'm trying to run simple kafka spark streaming example over spark-shell:
sc.stop
import org.apache.spark.SparkConf
import
hi,
I'm trying to run simple kafka spark streaming example over spark-shell:
sc.stop
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
import kafka.serializer.DefaultDecoder
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import
.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-ERROR-EndpointWriter-dropping-message-tp23228p23240.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
on different machine.
All the three cases were working properly.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-ERROR-EndpointWriter-dropping-message-tp23228p23240.html
Sent from the Apache Spark User List mailing list archive
at
[akka.tcp://sparkMaster@localhost:7077] inbound addresses are
[akka.tcp://sparkMaster@karma-HP-Pavilion-g6-Notebook-PC:7077]
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-ERROR-EndpointWriter-dropping-message-tp23228.html
Sent from
1) Could you share your command?
2) Are the kafka brokers on the same host?
3) Could you run a --describe on the topic to see if the topic is setup
correctly (just to be sure)?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-Streaming-ERROR
I had same problem.
The solution, I've found was to use:
JavaStreamingContext streamingContext =
JavaStreamingContext.getOrCreate('checkpoint_dir', contextFactory);
ALL configuration should be performed inside contextFactory. If you try
to configure streamContext after ::getOrCreate, you
but
everything is recreated from the checkpointed data.
Hope this helps,
NB
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/kafka-Spark-Streaming-with-checkPointing-fails-to-restart-tp22864p22878.html
Sent from the Apache Spark User List mailing list archive
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Thanks everyone, that was the problem. the create new streaming
context function was supposed to setup the stream processing as well
as the checkpoint directory. I had missed the whole process of
checkpoint setup. With that done, everything works as
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
I have a simple application which fails with the following exception
only when the application is restarted (i.e. the checkpointDir has
entires from a previous execution):
Exception in thread main org.apache.spark.SparkException:
.1001560.n3.nabble.com/kafka-Spark-Streaming-with-checkPointing-fails-to-restart-tp22864.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
I have a simple application which fails with the following exception
only when the application is restarted (i.e. the checkpointDir has
entires from a previous execution):
Exception in thread main org.apache.spark.SparkException:
be happening here.
--- Ankur Chauhan
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/kafka-Spark-Streaming-with-checkPointing-fails-to-restart-tp22864.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
the Receiver for Kafka, Will the
receiver be restarted on some other worker?
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-streaming-tp20914.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
block interval will go in the same
block.
2. If a worker goes down which runs the Receiver for Kafka, Will the
receiver be restarted on some other worker?
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-streaming-tp20914.html
Sent
in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-streaming-tp20914.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
Hi,
I am integrating Kafka and Spark, using spark-streaming. I have created a topic
as a kafka producer:
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 1 --partitions 1 --topic test
I am publishing messages in kafka and trying to read them using
It says:
14/11/27 11:56:05 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
A quick guess would be, you are giving the wrong master url. ( spark://
192.168.88.130:7077 ) Open the
...@sigmoidanalytics.com]
Sent: Monday, December 01, 2014 3:56 PM
To: Sarosh, M.
Cc: user@spark.apache.org
Subject: Re: Kafka+Spark-streaming issue: Stream 0 received 0 blocks
It says:
14/11/27 11:56:05 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster
1 - 100 of 107 matches
Mail list logo