Hi,
I've read in the documentation, that since spark 3.2.1 python API for
spark-streaming-kafka is back in the game.
https://spark.apache.org/docs/3.2.1/streaming-programming-guide.html#advanced-sources
But in the Kafka Integration Guide there is no documentation for the python API.
https
ocument
>> https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and
>> re-iterating the issue again for better understanding.
>> spark-streaming-kafka-0-10
>> <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html>
>> k
need to use
> spark streaming (KafkaUtils.createDirectStream) than structured streaming
> by following this document
> https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and
> re-iterating the issue again for better understanding.
> spark-streaming-kafka-0-10
> <https:/
for better understanding.
spark-streaming-kafka-0-10
<https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html>
kafka
connector prefix "spark-executor" + group.id for executors, driver uses
original group id.
*Here is the code where executor construct executor specific g
for better understanding.
spark-streaming-kafka-0-10
<https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html>
kafka
connector prefix "spark-executor" + group.id for executors, driver uses
original group id.
*Here is the code where executor construct executor specific g
t;groupIdPrefix" will be ignored.
I think it answers your questions.
As a general suggestion maybe it worth to revisit Spark 3.0 because
Structured Streaming has another interesting feature:
groupIdPrefix string spark-kafka-source streaming and batch Prefix of
consumer group id
Hi Team,
We have secured Kafka cluster (which only allows to consume from the
pre-configured, authorized consumer group), there is a scenario where we
want to use spark streaming to consume from secured kafka. so we have
decided to use spark-streaming-kafka-0-10
<https://spark.apache.org/d
ockManager: Removing RDD 9
seems /consumer.poll(timeout)/ takes too long when first poll in task
Version information:
spark 2.11
kafka 0.10.0.1
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
message
From: zenglong chen
Date: 8/2/19 09:59 (GMT+08:00)
To: user@spark.apache.org
Subject: spark stream kafka wait for all data process done
How can kafka wait for tasks process done then begin receive next batch?I want
to process 5000 record once by pandas and it may take too long time
How can kafka wait for tasks process done then begin receive next batch?I
want to process 5000 record once by pandas and it may take too long time to
process.
Dears,
I'm working on a project that should integrate spark streaming with kafka
using java.
Currently the official documentation is confusing, it's not clear whether
"spark-streaming-kafka-0-10" is safe to be used in production environment
or not.
According to "Spark St
How to repartition Spark DStream Kafka ConsumerRecord RDD. I am getting
uneven size of Kafka topics.. We want to repartition the input RDD based on
some logic.
But when I try to apply the repartition I am getting "object not serializable
(
Ortiz Fernández
wrote:
> I can't... do you think that it's a possible bug of this version?? from
> Spark or Kafka?
>
> El mié., 29 ago. 2018 a las 22:28, Cody Koeninger ()
> escribió:
>>
>> Are you able to try a recent version of spark?
>>
>> On Wed, Aug 29, 201
I can't... do you think that it's a possible bug of this version?? from
Spark or Kafka?
El mié., 29 ago. 2018 a las 22:28, Cody Koeninger ()
escribió:
> Are you able to try a recent version of spark?
>
> On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández
> wrote:
> &g
fka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1091)
> ~[kafka-clients-1.0.0.jar:na]
> at
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:169)
> ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.
(DirectKafkaInputDStream.scala:169)
~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2]
at
org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:188)
~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2]
at
org.apache.spark.streaming.kafka010
Hello there,
I just upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload
and got the error (java.lang.AbstractMethodError) never seen before; check
the error stack attached in (a) bellow.
anyone knows if spark 2.3.1 works well with kafka
spark-streaming-kafka-0-10?
this link
Hi,
I have a simple Java program to read data from kafka using spark streaming.
When i run it from eclipse on my mac, it is connecting to the zookeeper,
bootstrap nodes,
But its not displaying any data. it does not give any error.
it just shows
18/01/16 20:49:15 INFO Executor: Finished task
Hi All, I’m new to streaming avro records and am parsing Avro from a Kafka
direct stream with spark streaming 2.1.1, I was wondering if anyone could
please suggest an API for decoding Avro records with Scala? I’ve found
KafkaAvroDecoder, twitter/bijection and the Avro library, each seem to
Hello.
I have a process (python) that reads a kafka queue, for each record it checks
in a table.
# Load table in memory
table=sqlContext.sql("select id from table")
table.cache()
kafkaTopic.foreachRDD(processForeach)
def processForeach (time, rdd):
print(time)
for k in rdd.collect ():
if
backporting it to spark 1.5.1
On Thu, Jun 29, 2017 at 11:07 AM, Martin Peng <wei...@gmail.com> wrote:
> Hi,
>
> We planned to upgrade our Spark Kafka library to 0.10 from 0.81 to simplify
> our infrastructure code logic. Does anybody know when will the 010 version
> become stable fro
Hi,
We planned to upgrade our Spark Kafka library to 0.10 from 0.81 to simplify
our infrastructure code logic. Does anybody know when will the 010 version
become stable from experimental?
May I use this 010 version together with Spark 1.5.1?
https://spark.apache.org/docs/latest/streaming-kafka-0
cation with Kafka Direct Stream to
> Spark 2 (spark-streaming-kafka-0-10 - 2.1.0) with Kafka version (0.10.0.0)
> . We find abnormal delays after the application has run for a couple of
> hours & completed consumption of a ~ 10 million records. There is a sudden
> dip in the processing
persisters that appear to
be working fine.
The tasks appear to go fine until approximately 74-80 of the tasks (of 96) in,
and then the remaining tasks take a while. I'm using EMR/Spark 2.1.0/Kafka
0.10.0.1/EMRFS (EMR's S3 solution). Any help would be greatly appreciated!
Here's the code I'm
Thanks for the tip. That worked. When would one use the assembly?
On Wed, Mar 29, 2017 at 7:13 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:
> Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly)
>
> On Wed, Mar 29, 2017 at 9:59 AM, Srikan
Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly)
On Wed, Mar 29, 2017 at 9:59 AM, Srikanth <srikanth...@gmail.com> wrote:
> Hello,
>
> I'm trying to use "org.json4s" % "json4s-native" library in a spark
> streaming + k
Hello,
I'm trying to use "org.json4s" % "json4s-native" library in a spark
streaming + kafka direct app.
When I use the latest version of the lib I get an error similar to this
<https://github.com/json4s/json4s/issues/316>
The work around suggest there is to use ve
Hi,
I think I'm seeing a bug in the context of upgrading to using the Kafka 0.10
streaming API. Code fragments follow.
--
Nick
JavaInputDStream> rawStream =
getDirectKafkaStream();
JavaDStream> messagesTuple =
Glad you got it worked out. That's cool as long as your use case doesn't
actually require e.g. partition 0 to always be scheduled to the same
executor across different batches.
On Tue, Mar 21, 2017 at 7:35 PM, OUASSAIDI, Sami
wrote:
> So it worked quite well with a
So it worked quite well with a coalesce, I was able to find an solution to
my problem : Altough not directly handling the executor a good roundaway
was to assign the desired partition to a specific stream through assign
strategy and coalesce to a single partition then repeat the same process
for
Another option that would avoid a shuffle would be to use assign and
coalesce, running two separate streams.
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "...")
.option("assign", """{t0: {"0": }, t1:{"0": x}}""")
.load()
.coalesce(1)
.writeStream
@Cody : Duly noted.
@Michael Ambrust : A repartition is out of the question for our project as
it would be a fairly expensive operation. We tried looking into targeting a
specific executor so as to avoid this extra cost and directly have well
partitioned data after consuming the kafka topics. Also
Sorry, typo. Should be a repartition not a groupBy.
> spark.readStream
> .format("kafka")
> .option("kafka.bootstrap.servers", "...")
> .option("subscribe", "t0,t1")
> .load()
> .repartition($"partition")
> .writeStream
> .foreach(... code to write to cassandra ...)
>
I think it should be straightforward to express this using structured
streaming. You could ensure that data from a given partition ID is
processed serially by performing a group by on the partition column.
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "...")
Spark just really isn't a good fit for trying to pin particular computation
to a particular executor, especially if you're relying on that for
correctness.
On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami
wrote:
>
> Hi all,
>
> So I need to specify how an executor
Hi all,
So I need to specify how an executor should consume data from a kafka topic.
Let's say I have 2 topics : t0 and t1 with two partitions each, and two
executors e0 and e1 (both can be on the same node so assign strategy does
not work since in the case of a multi executor node it works
This is running in YARN cluster mode. It was restarted automatically and
continued fine.
I was trying to see what went wrong. AFAIK there were no task failure.
Nothing in executor logs. The log I gave is in driver.
After some digging, I did see that there was a rebalance in kafka logs
around this
Does restarting after a few minutes solves the problem? Could be a
transient issue that lasts long enough for spark task-level retries to all
fail.
On Tue, Feb 7, 2017 at 4:34 PM, Srikanth wrote:
> Hello,
>
> I had a spark streaming app that reads from kafka running for a
Hello,
I had a spark streaming app that reads from kafka running for a few hours
after which it failed with error
*17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time
148649785 ms
java.lang.IllegalStateException: No current assignment for partition mt_event-5
at
Cool! Thanks for your inputs Jacek and Mark!
From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: 13 January 2017 12:59
To: Phadnis, Varun <phad...@sky.optymyze.com>
Cc: user@spark.apache.org
Subject: Re: Spark and Kafka integration
See "API compatibility" in http://
arun <phad...@sky.optymyze.com>
wrote:
> Hello,
>
>
>
> We are using Spark 2.0 with Kafka 0.10.
>
>
>
> As I understand, much of the API packaged in the following dependency we
> are targeting is marked as “@Experimental”
>
>
>
>
>
> org.ap
tering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Thu, Jan 12, 2017 at 2:57 PM, Phadnis, Varun
<phad...@sky.optymyze.com> wrote:
> Hello,
>
>
>
> We are using Spark 2.0 with Kafka 0.10.
>
>
>
> As
Hello,
We are using Spark 2.0 with Kafka 0.10.
As I understand, much of the API packaged in the following dependency we are
targeting is marked as "@Experimental"
org.apache.spark
spark-streaming-kafka-0-10_2.11
2.0.0
What are implications of this being marked as ex
u are correct that you shouldn't have to worry about broker id.
>>>>>
>>>>> I'm honestly not sure specifically what else you are asking at this
>>>>> point.
>>>>>
>>>>> On Tue, Oct 25, 2016 at 1:39 PM, Sunita Arvind <sunitarv...@gmail.com&
;>>> I'm honestly not sure specifically what else you are asking at this
>>>> point.
>>>>
>>>> On Tue, Oct 25, 2016 at 1:39 PM, Sunita Arvind <sunitarv...@gmail.com>
>>>> wrote:
>>>> > Just re-read the kafka archit
is, it
>>> > is leader based. So topic/partitionId pair will be same on all the
>>> brokers.
>>> > So we do not need to consider brokerid while storing offsets. Still
>>> > exploring rest of the items.
>>> > regards
>>> > Sunita
>
;> Hello Experts,
>> >>
>> >> I am trying to use the saving to ZK design. Just saw Sudhir's comments
>> >> that it is old approach. Any reasons for that? Any issues observed with
>> >> saving to ZK. The way we are planning to use it is:
>> >
tarv...@gmail.com>
> > wrote:
> >>
> >> Hello Experts,
> >>
> >> I am trying to use the saving to ZK design. Just saw Sudhir's comments
> >> that it is old approach. Any reasons for that? Any issues observed with
> >> saving to Z
.@gmail.com>
> wrote:
>>
>> Hello Experts,
>>
>> I am trying to use the saving to ZK design. Just saw Sudhir's comments
>> that it is old approach. Any reasons for that? Any issues observed with
>> saving to ZK. The way we are planning to use it is:
>> 1
:
> 1. Following http://aseigneurin.github.io/2016/05/07/spark-kafka-achievin
> g-zero-data-loss.html
> 2. Saving to the same file with offsetRange as a part of the file. We hope
> that there are no partial writes/ overwriting is possible and offsetRanges
>
> However I have below doubts which
Hello Experts,
I am trying to use the saving to ZK design. Just saw Sudhir's comments that
it is old approach. Any reasons for that? Any issues observed with saving
to ZK. The way we are planning to use it is:
1. Following http://aseigneurin.github.io/2016/05/07/spark-kafka-
achieving-zero-data
k version 2.0.0. and I start the spark-shell with:
>> >
>> > spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.0.0
>> >
>> > As described here:
>> >
>> > https://github.com/apache/spark/blob/master/docs/structured-streaming-
//github.com/apache/spark/blob/master/docs/
> structured-streaming-kafka-integration.md
> >
> > But I get a unresolved dependency error ("unresolved dependency:
> > org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not found"). So it
> seems
> > not to be availabl
:spark-sql-kafka-0-10_2.11:2.0.0
>
> As described here:
> https://github.com/apache/spark/blob/master/docs/structured-streaming-kafka-integration.md
>
> But I get a unresolved dependency error ("unresolved dependency:
> org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not f
add --jars /spark-streaming-kafka_2.10-1.5.1.jar
(may need to download the jar file or any newer version)
to spark-shell.
I also have spark-streaming-kafka-assembly_2.10-1.6.1.jar as well on --jar
list
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
rk/blob/master/docs/structured-streaming-kafka-integration.md
>
> But I get a unresolved dependency error ("unresolved dependency:
> org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not found"). So it seems
> not to be available via maven or spark-packages.
>
> How can I a
-integration.md
But I get a unresolved dependency error ("unresolved dependency:
org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not found"). So it seems
not to be available via maven or spark-packages.
How can I accesss this package? Or am I doing something wrong/missing?
Thank y
s for time
> 1473491465000 ms (execution: 0.066 s) *(EVENT 2nd time process cost 0.066)*
>
> and the 2nd time processing of the event finished without really doing the
> work.
>
> Help is hugely appreciated.
>
>
>
> --
> View this message in context: http://apache-spark-
Are you using the receiver based stream?
On Sep 10, 2016 15:45, "Eric Ho" wrote:
> I notice that some Spark programs would contact something like 'zoo1:2181'
> when trying to suck data out of Kafka.
>
> Does the kafka data actually transported out over this port ?
>
>
I notice that some Spark programs would contact something like 'zoo1:2181'
when trying to suck data out of Kafka.
Does the kafka data actually transported out over this port ?
Typically Zookeepers use 2218 for SSL.
If my Spark program were to use 2218, how would I specify zookeeper
specific
message in context: http://apache-spark-user-list.
1001560.n3.nabble.com/spark-streaming-kafka-connector-
questions-tp27681p27687.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-
hed without really doing the
work.
Help is hugely appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-connector-questions-tp27681p27687.html
Sent from the Apache Spark User List mailing list archive at Nabble.
e
>
> https://github.com/koeninger/kafka-exactly-once
>
> On Thu, Sep 8, 2016 at 4:44 PM, Cheng Yi <phillipchen...@gmail.com> wrote:
> > I am using the lastest streaming kafka connector
> > org.apache.spark
> > spark-streaming-kafka_2.11
> > 1.6.2
> &g
to whether the batch processed
successfully
If you're unclear on how the kafka integration works, see
https://github.com/koeninger/kafka-exactly-once
On Thu, Sep 8, 2016 at 4:44 PM, Cheng Yi <phillipchen...@gmail.com> wrote:
> I am using the lastest streaming kafka connector
> org.apache.s
I am using the lastest streaming kafka connector
org.apache.spark
spark-streaming-kafka_2.11
1.6.2
I am facing the problem that a message is delivered two times to my
consumers. these two deliveries are 10+ seconds apart, it looks this is
caused by my lengthy message processing (took about 60
bm.com/profiles/html/myProfileView.do> 8200
>> Warden Ave
>> Markham, ON L6G 1C7
>> Canada
>>
>>
>>
>> - Original message -
>> From: Cody Koeninger <c...@koeninger.org>
>> To: Eric Ho <e...@analyticsmd.com>
>> Cc: "u
; Warden Ave
> Markham, ON L6G 1C7
> Canada
>
>
>
> - Original message -
> From: Cody Koeninger <c...@koeninger.org>
> To: Eric Ho <e...@analyticsmd.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: Spark to
: Cody Koeninger <c...@koeninger.org>To: Eric Ho <e...@analyticsmd.com>Cc: "user@spark.apache.org" <user@spark.apache.org>Subject: Re: Spark to Kafka communication encrypted ?Date: Wed, Aug 31, 2016 10:34 AM
Encryption is only available in spark-streaming-kafka-0-10, not 0-
Encryption is only available in spark-streaming-kafka-0-10, not 0-8.
You enable it the same way you enable it for the Kafka project's new
consumer, by setting kafka configuration parameters appropriately.
http://kafka.apache.org/documentation.html#security_ssl
On Wed, Aug 31, 2016 at 2:03 AM
I can't find in Spark 1.6.2's docs in how to turn encryption on for Spark
to Kafka communication ... I think that the Spark docs only tells you how
to turn on encryption for inter Spark node communications .. Am I wrong ?
Thanks.
--
-eric ho
See
https://github.com/koeninger/kafka-exactly-once
On Aug 23, 2016 10:30 AM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com>
wrote:
> Hi Experts,
>
> I am looking for some information on how to acheive zero data loss while
> working with kafka and Spark. I have se
com> wrote:
> Hi Experts,
>
> I am looking for some information on how to acheive zero data loss while
> working with kafka and Spark. I have searched online and blogs have
> different answer. Please let me know if anyone has idea on this.
>
> Blog 1:
> https://databrick
Hi Experts,
I am looking for some information on how to acheive zero data loss while
working with kafka and Spark. I have searched online and blogs have
different answer. Please let me know if anyone has idea on this.
Blog 1:
https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance
GMT+08:00 Cody Koeninger <c...@koeninger.org>:
> For 2.0, the kafka dstream support is in two separate subprojects
> depending on which version of Kafka you are using
>
> spark-streaming-kafka-0-10
> or
> spark-streaming-kafka-0-8
>
> corresponding to brokers
For 2.0, the kafka dstream support is in two separate subprojects
depending on which version of Kafka you are using
spark-streaming-kafka-0-10
or
spark-streaming-kafka-0-8
corresponding to brokers that are version 0.10+ or 0.8+
On Mon, Jul 25, 2016 at 12:29 PM, Reynold Xin <r...@databricks.
support. We also use kafka
Andy
From: Marco Mistroni <mmistr...@gmail.com>
Date: Monday, July 25, 2016 at 2:33 AM
To: kevin <kiss.kevin...@gmail.com>
Cc: "user @spark" <user@spark.apache.org>, "dev.spark"
<d...@spark.apache.org>
Subject: Re: where I can f
iss.kevin...@gmail.com>:
>
>> hi,all :
>> I try to run example org.apache.spark.examples.streaming.KafkaWordCount ,
>> I got error :
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/streaming/kafka/KafkaUtils$
>> at
>> org.apache.spark.examples.
rk.examples.streaming.KafkaWordCount ,
>> I got error :
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/streaming/kafka/KafkaUtils$
>> at
>> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun
I have compile it from source code
2016-07-25 12:05 GMT+08:00 kevin <kiss.kevin...@gmail.com>:
> hi,all :
> I try to run example org.apache.spark.examples.streaming.KafkaWordCount ,
> I got error :
> Exception in thread "main" java.lang.NoClassDefFoundError:
> o
hi,all :
I try to run example org.apache.spark.examples.streaming.KafkaWordCount , I
got error :
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/streaming/kafka/KafkaUtils$
at
org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun
mapping between Kafka
and RDD partitions, which is easier to understand and tune.
On Jul 7, 2016 3:04 PM, "SamyaMaiti" <samya.maiti2...@gmail.com> wrote:
> Hi Team,
>
> Is there a way we can consume from Kafka using spark Streaming direct API
> using multiple consumers
Hi Team,
Is there a way we can consume from Kafka using spark Streaming direct API
using multiple consumers (belonging to same consumer group)
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers
you want to do this kind of thing, you will need to maintain your
> >> own index from time to offset.
> >>
> >> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > Is there
ent proposal for it but it has gotten pushed back to at least
>> 0.10.1
>>
>> If you want to do this kind of thing, you will need to maintain your
>> own index from time to offset.
>>
>> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote:
&
m> wrote:
> > Hi all,
> >
> > Is there any way to re-compute using Spark Streaming - Kafka Direct
> Approach
> > from specific time?
> >
> > In some cases, I want to re-compute again from specific time (e.g
> beginning
> > of day)? is that possible?
> >
> >
> >
> > --
> > Thanks
> > Kien
>
ent...@gmail.com> wrote:
> Hi all,
>
> Is there any way to re-compute using Spark Streaming - Kafka Direct Approach
> from specific time?
>
> In some cases, I want to re-compute again from specific time (e.g beginning
> of day)? is that possible?
low error while trying to consume message from Kafka
> through Spark streaming (Kafka direct API). This used to work OK when using
> Spark standalone cluster manager. We're just switching to using Cloudera 5.7
> using Yarn to manage Spark cluster and started to see the below error.
>
>
Hi all,
Is there any way to re-compute using Spark Streaming - Kafka Direct
Approach from specific time?
In some cases, I want to re-compute again from specific time (e.g beginning
of day)? is that possible?
--
Thanks
Kien
I'm running into below error while trying to consume message from Kafka
through Spark streaming (Kafka direct API). This used to work OK when using
Spark standalone cluster manager. We're just switching to using Cloudera
5.7 using Yarn to manage Spark cluster and started to see the below error
Hi All,
I am trying to get spark 1.4.1 (Java) work with Kafka 0.8.2 in Kerberos enabled
cluster. HDP 2.3.2
Is there any document I can refer to.
Thanks,
Pradeep
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For
he version of Kafka.
>>
>> On 4 May 2016 at 16:18, Luca Ferrari <ferrari.l...@live.it> wrote:
>>
>>> Hi,
>>>
>>> I’m new on Apache Spark and I’m trying to run the Spark Streaming +
>>> Kafka Integration Direct Approach example (JavaDirectKa
Check the version of the libraries you downloaded, the version of spark,
> the version of Kafka.
>
> On 4 May 2016 at 16:18, Luca Ferrari <ferrari.l...@live.it> wrote:
>
>> Hi,
>>
>> I’m new on Apache Spark and I’m trying to run the Spark Streaming +
>> Kafka Integr
and I’m trying to run the Spark Streaming + Kafka
> Integration Direct Approach example (JavaDirectKafkaWordCount.java).
>
> I’ve downloaded all the libraries but when I try to run I get this error
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.Prede
Hi,
I’m new on Apache Spark and I’m trying to run the Spark Streaming + Kafka
Integration Direct Approach example (JavaDirectKafkaWordCount.java).
I’ve downloaded all the libraries but when I try to run I get this error
Exception in thread "main" java.lang.NoSuchMethodError:
sc
in a new data
>> center isn't really different from starting up a post-crash job in the
>> original data center.
>>
>> On Tue, Apr 19, 2016 at 3:32 AM, Erwan ALLAIN <eallain.po...@gmail.com
>> <mailto:eallain.po...@gmail.com>> wrote:
>> Thanks
;> Thanks Jason and Cody. I'll try to explain a bit better the Multi DC case.
>>
>> As I mentionned before, I'm planning to use one kafka cluster and 2 or more
>> spark cluster distinct.
>>
>> Let's say we have the following DCs configuration in a nominal case.
ata
>>> center isn't really different from starting up a post-crash job in the
>>> original data center.
>>>
>>> On Tue, Apr 19, 2016 at 3:32 AM, Erwan ALLAIN <eallain.po...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Jason and
t;> On Tue, Apr 19, 2016 at 3:32 AM, Erwan ALLAIN <eallain.po...@gmail.com>
>> wrote:
>>
>>> Thanks Jason and Cody. I'll try to explain a bit better the Multi DC
>>> case.
>>>
>>> As I mentionned before, I'm planning to use one kafka cluster and
spark cluster distinct.
>>
>> Let's say we have the following DCs configuration in a nominal case.
>> Kafka partitions are consumed uniformly by the 2 datacenters.
>>
>> DataCenter Spark Kafka Consumer Group Kafka partition (P1 to P4)
>> DC 1 Master 1.1
>
n and Cody. I'll try to explain a bit better the Multi DC case.
>
> As I mentionned before, I'm planning to use one kafka cluster and 2 or more
> spark cluster distinct.
>
> Let's say we have the following DCs configuration in a nominal case.
> Kafka partitions are consumed u
1 - 100 of 247 matches
Mail list logo