python API in Spark-streaming-kafka spark 3.2.1

2022-03-07 Thread Wiśniewski Michał
Hi, I've read in the documentation, that since spark 3.2.1 python API for spark-streaming-kafka is back in the game. https://spark.apache.org/docs/3.2.1/streaming-programming-guide.html#advanced-sources But in the Kafka Integration Guide there is no documentation for the python API. https

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-06 Thread Sethupathi T
ocument >> https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and >> re-iterating the issue again for better understanding. >> spark-streaming-kafka-0-10 >> <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> >> k

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-06 Thread Gabor Somogyi
need to use > spark streaming (KafkaUtils.createDirectStream) than structured streaming > by following this document > https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and > re-iterating the issue again for better understanding. > spark-streaming-kafka-0-10 > <https:/

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T
for better understanding. spark-streaming-kafka-0-10 <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> kafka connector prefix "spark-executor" + group.id for executors, driver uses original group id. *Here is the code where executor construct executor specific g

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T
for better understanding. spark-streaming-kafka-0-10 <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> kafka connector prefix "spark-executor" + group.id for executors, driver uses original group id. *Here is the code where executor construct executor specific g

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Gabor Somogyi
t;groupIdPrefix" will be ignored. I think it answers your questions. As a general suggestion maybe it worth to revisit Spark 3.0 because Structured Streaming has another interesting feature: groupIdPrefix string spark-kafka-source streaming and batch Prefix of consumer group id

[Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T
Hi Team, We have secured Kafka cluster (which only allows to consume from the pre-configured, authorized consumer group), there is a scenario where we want to use spark streaming to consume from secured kafka. so we have decided to use spark-streaming-kafka-0-10 <https://spark.apache.org/d

Spark streaming kafka source delay occasionally

2019-08-15 Thread ans
ockManager: Removing RDD 9 seems /consumer.poll(timeout)/ takes too long when first poll in task Version information: spark 2.11 kafka 0.10.0.1 -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: spark stream kafka wait for all data process done

2019-08-01 Thread 刘 勇
message From: zenglong chen Date: 8/2/19 09:59 (GMT+08:00) To: user@spark.apache.org Subject: spark stream kafka wait for all data process done How can kafka wait for tasks process done then begin receive next batch?I want to process 5000 record once by pandas and it may take too long time

spark stream kafka wait for all data process done

2019-08-01 Thread zenglong chen
How can kafka wait for tasks process done then begin receive next batch?I want to process 5000 record once by pandas and it may take too long time to process.

Why "spark-streaming-kafka-0-10" is still experimental?

2019-04-04 Thread Doaa Medhat
Dears, I'm working on a project that should integrate spark streaming with kafka using java. Currently the official documentation is confusing, it's not clear whether "spark-streaming-kafka-0-10" is safe to be used in production environment or not. According to "Spark St

How to repartition Spark DStream Kafka ConsumerRecord RDD.

2018-09-28 Thread Alchemist
 How to repartition Spark DStream Kafka ConsumerRecord RDD.  I am getting uneven size of Kafka topics.. We want to repartition the input RDD based on some logic.  But when I try to apply the repartition I am getting "object not serializable (

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-30 Thread Cody Koeninger
Ortiz Fernández wrote: > I can't... do you think that it's a possible bug of this version?? from > Spark or Kafka? > > El mié., 29 ago. 2018 a las 22:28, Cody Koeninger () > escribió: >> >> Are you able to try a recent version of spark? >> >> On Wed, Aug 29, 201

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández
I can't... do you think that it's a possible bug of this version?? from Spark or Kafka? El mié., 29 ago. 2018 a las 22:28, Cody Koeninger () escribió: > Are you able to try a recent version of spark? > > On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández > wrote: > &g

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Cody Koeninger
fka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1091) > ~[kafka-clients-1.0.0.jar:na] > at > org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:169) > ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.

Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández
(DirectKafkaInputDStream.scala:169) ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:188) ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] at org.apache.spark.streaming.kafka010

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

2018-06-28 Thread Peter Liu
Hello there, I just upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload and got the error (java.lang.AbstractMethodError) never seen before; check the error stack attached in (a) bellow. anyone knows if spark 2.3.1 works well with kafka spark-streaming-kafka-0-10? this link

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark
Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

Spark Streaming Kafka

2017-11-10 Thread Frank Staszak
Hi All, I’m new to streaming avro records and am parsing Avro from a Kafka direct stream with spark streaming 2.1.1, I was wondering if anyone could please suggest an API for decoding Avro records with Scala? I’ve found KafkaAvroDecoder, twitter/bijection and the Avro library, each seem to

Spark Streaming + Kafka + Hive: delayed

2017-09-20 Thread toletum
Hello. I have a process (python) that reads a kafka queue, for each record it checks in a table. # Load table in memory table=sqlContext.sql("select id from table") table.cache() kafkaTopic.foreachRDD(processForeach) def processForeach (time, rdd): print(time) for k in rdd.collect (): if

Re: The stability of Spark Stream Kafka 010

2017-06-29 Thread Cody Koeninger
backporting it to spark 1.5.1 On Thu, Jun 29, 2017 at 11:07 AM, Martin Peng <wei...@gmail.com> wrote: > Hi, > > We planned to upgrade our Spark Kafka library to 0.10 from 0.81 to simplify > our infrastructure code logic. Does anybody know when will the 010 version > become stable fro

The stability of Spark Stream Kafka 010

2017-06-29 Thread Martin Peng
Hi, We planned to upgrade our Spark Kafka library to 0.10 from 0.81 to simplify our infrastructure code logic. Does anybody know when will the 010 version become stable from experimental? May I use this 010 version together with Spark 1.5.1? https://spark.apache.org/docs/latest/streaming-kafka-0

Re: Spark 2 Kafka Direct Stream Consumer Issue

2017-05-24 Thread Jayadeep J
cation with Kafka Direct Stream to > Spark 2 (spark-streaming-kafka-0-10 - 2.1.0) with Kafka version (0.10.0.0) > . We find abnormal delays after the application has run for a couple of > hours & completed consumption of a ~ 10 million records. There is a sudden > dip in the processing

Spark Streaming Kafka Job has strange behavior for certain tasks

2017-04-05 Thread Justin Miller
persisters that appear to be working fine. The tasks appear to go fine until approximately 74-80 of the tasks (of 96) in, and then the remaining tasks take a while. I'm using EMR/Spark 2.1.0/Kafka 0.10.0.1/EMRFS (EMR's S3 solution). Any help would be greatly appreciated! Here's the code I'm

Re: Spark streaming + kafka error with json library

2017-03-30 Thread Srikanth
Thanks for the tip. That worked. When would one use the assembly? On Wed, Mar 29, 2017 at 7:13 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly) > > On Wed, Mar 29, 2017 at 9:59 AM, Srikan

Re: Spark streaming + kafka error with json library

2017-03-29 Thread Tathagata Das
Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly) On Wed, Mar 29, 2017 at 9:59 AM, Srikanth <srikanth...@gmail.com> wrote: > Hello, > > I'm trying to use "org.json4s" % "json4s-native" library in a spark > streaming + k

Spark streaming + kafka error with json library

2017-03-29 Thread Srikanth
Hello, I'm trying to use "org.json4s" % "json4s-native" library in a spark streaming + kafka direct app. When I use the latest version of the lib I get an error similar to this <https://github.com/json4s/json4s/issues/316> The work around suggest there is to use ve

[ Spark Streaming & Kafka 0.10 ] Possible bug

2017-03-22 Thread Afshartous, Nick
Hi, I think I'm seeing a bug in the context of upgrading to using the Kafka 0.10 streaming API. Code fragments follow. -- Nick JavaInputDStream> rawStream = getDirectKafkaStream(); JavaDStream> messagesTuple =

Re: [Spark Streaming+Kafka][How-to]

2017-03-22 Thread Cody Koeninger
Glad you got it worked out. That's cool as long as your use case doesn't actually require e.g. partition 0 to always be scheduled to the same executor across different batches. On Tue, Mar 21, 2017 at 7:35 PM, OUASSAIDI, Sami wrote: > So it worked quite well with a

Re: [Spark Streaming+Kafka][How-to]

2017-03-21 Thread OUASSAIDI, Sami
So it worked quite well with a coalesce, I was able to find an solution to my problem : Altough not directly handling the executor a good roundaway was to assign the desired partition to a specific stream through assign strategy and coalesce to a single partition then repeat the same process for

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread Michael Armbrust
Another option that would avoid a shuffle would be to use assign and coalesce, running two separate streams. spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "...") .option("assign", """{t0: {"0": }, t1:{"0": x}}""") .load() .coalesce(1) .writeStream

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread OUASSAIDI, Sami
@Cody : Duly noted. @Michael Ambrust : A repartition is out of the question for our project as it would be a fairly expensive operation. We tried looking into targeting a specific executor so as to avoid this extra cost and directly have well partitioned data after consuming the kafka topics. Also

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread Michael Armbrust
Sorry, typo. Should be a repartition not a groupBy. > spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", "...") > .option("subscribe", "t0,t1") > .load() > .repartition($"partition") > .writeStream > .foreach(... code to write to cassandra ...) >

Re: [Spark Streaming+Kafka][How-to]

2017-03-16 Thread Michael Armbrust
I think it should be straightforward to express this using structured streaming. You could ensure that data from a given partition ID is processed serially by performing a group by on the partition column. spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "...")

Re: [Spark Streaming+Kafka][How-to]

2017-03-16 Thread Cody Koeninger
Spark just really isn't a good fit for trying to pin particular computation to a particular executor, especially if you're relying on that for correctness. On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami wrote: > > Hi all, > > So I need to specify how an executor

[Spark Streaming+Kafka][How-to]

2017-03-16 Thread OUASSAIDI, Sami
Hi all, So I need to specify how an executor should consume data from a kafka topic. Let's say I have 2 topics : t0 and t1 with two partitions each, and two executors e0 and e1 (both can be on the same node so assign strategy does not work since in the case of a multi executor node it works

Re: Exception in spark streaming + kafka direct app

2017-02-07 Thread Srikanth
This is running in YARN cluster mode. It was restarted automatically and continued fine. I was trying to see what went wrong. AFAIK there were no task failure. Nothing in executor logs. The log I gave is in driver. After some digging, I did see that there was a rebalance in kafka logs around this

Re: Exception in spark streaming + kafka direct app

2017-02-07 Thread Tathagata Das
Does restarting after a few minutes solves the problem? Could be a transient issue that lasts long enough for spark task-level retries to all fail. On Tue, Feb 7, 2017 at 4:34 PM, Srikanth wrote: > Hello, > > I had a spark streaming app that reads from kafka running for a

Exception in spark streaming + kafka direct app

2017-02-07 Thread Srikanth
Hello, I had a spark streaming app that reads from kafka running for a few hours after which it failed with error *17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time 148649785 ms java.lang.IllegalStateException: No current assignment for partition mt_event-5 at

RE: Spark and Kafka integration

2017-01-12 Thread Phadnis, Varun
Cool! Thanks for your inputs Jacek and Mark! From: Mark Hamstra [mailto:m...@clearstorydata.com] Sent: 13 January 2017 12:59 To: Phadnis, Varun <phad...@sky.optymyze.com> Cc: user@spark.apache.org Subject: Re: Spark and Kafka integration See "API compatibility" in http://

Re: Spark and Kafka integration

2017-01-12 Thread Mark Hamstra
arun <phad...@sky.optymyze.com> wrote: > Hello, > > > > We are using Spark 2.0 with Kafka 0.10. > > > > As I understand, much of the API packaged in the following dependency we > are targeting is marked as “@Experimental” > > > > > > org.ap

Re: Spark and Kafka integration

2017-01-12 Thread Jacek Laskowski
tering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jan 12, 2017 at 2:57 PM, Phadnis, Varun <phad...@sky.optymyze.com> wrote: > Hello, > > > > We are using Spark 2.0 with Kafka 0.10. > > > > As

Spark and Kafka integration

2017-01-12 Thread Phadnis, Varun
Hello, We are using Spark 2.0 with Kafka 0.10. As I understand, much of the API packaged in the following dependency we are targeting is marked as "@Experimental" org.apache.spark spark-streaming-kafka-0-10_2.11 2.0.0 What are implications of this being marked as ex

Re: Zero Data Loss in Spark with Kafka

2016-10-26 Thread Cody Koeninger
u are correct that you shouldn't have to worry about broker id. >>>>> >>>>> I'm honestly not sure specifically what else you are asking at this >>>>> point. >>>>> >>>>> On Tue, Oct 25, 2016 at 1:39 PM, Sunita Arvind <sunitarv...@gmail.com&

Re: Zero Data Loss in Spark with Kafka

2016-10-26 Thread Sunita Arvind
;>>> I'm honestly not sure specifically what else you are asking at this >>>> point. >>>> >>>> On Tue, Oct 25, 2016 at 1:39 PM, Sunita Arvind <sunitarv...@gmail.com> >>>> wrote: >>>> > Just re-read the kafka archit

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
is, it >>> > is leader based. So topic/partitionId pair will be same on all the >>> brokers. >>> > So we do not need to consider brokerid while storing offsets. Still >>> > exploring rest of the items. >>> > regards >>> > Sunita >

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
;> Hello Experts, >> >> >> >> I am trying to use the saving to ZK design. Just saw Sudhir's comments >> >> that it is old approach. Any reasons for that? Any issues observed with >> >> saving to ZK. The way we are planning to use it is: >> >

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
tarv...@gmail.com> > > wrote: > >> > >> Hello Experts, > >> > >> I am trying to use the saving to ZK design. Just saw Sudhir's comments > >> that it is old approach. Any reasons for that? Any issues observed with > >> saving to Z

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Cody Koeninger
.@gmail.com> > wrote: >> >> Hello Experts, >> >> I am trying to use the saving to ZK design. Just saw Sudhir's comments >> that it is old approach. Any reasons for that? Any issues observed with >> saving to ZK. The way we are planning to use it is: >> 1

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
: > 1. Following http://aseigneurin.github.io/2016/05/07/spark-kafka-achievin > g-zero-data-loss.html > 2. Saving to the same file with offsetRange as a part of the file. We hope > that there are no partial writes/ overwriting is possible and offsetRanges > > However I have below doubts which

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Sunita Arvind
Hello Experts, I am trying to use the saving to ZK design. Just saw Sudhir's comments that it is old approach. Any reasons for that? Any issues observed with saving to ZK. The way we are planning to use it is: 1. Following http://aseigneurin.github.io/2016/05/07/spark-kafka- achieving-zero-data

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-14 Thread Cody Koeninger
k version 2.0.0. and I start the spark-shell with: >> > >> > spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.0.0 >> > >> > As described here: >> > >> > https://github.com/apache/spark/blob/master/docs/structured-streaming-

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-14 Thread Julian Keppel
//github.com/apache/spark/blob/master/docs/ > structured-streaming-kafka-integration.md > > > > But I get a unresolved dependency error ("unresolved dependency: > > org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not found"). So it > seems > > not to be availabl

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread Cody Koeninger
:spark-sql-kafka-0-10_2.11:2.0.0 > > As described here: > https://github.com/apache/spark/blob/master/docs/structured-streaming-kafka-integration.md > > But I get a unresolved dependency error ("unresolved dependency: > org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not f

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread Mich Talebzadeh
add --jars /spark-streaming-kafka_2.10-1.5.1.jar (may need to download the jar file or any newer version) to spark-shell. I also have spark-streaming-kafka-assembly_2.10-1.6.1.jar as well on --jar list HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread Sean Owen
rk/blob/master/docs/structured-streaming-kafka-integration.md > > But I get a unresolved dependency error ("unresolved dependency: > org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not found"). So it seems > not to be available via maven or spark-packages. > > How can I a

Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread JayKay
-integration.md But I get a unresolved dependency error ("unresolved dependency: org.apache.spark#spark-sql-kafka-0-10_2.11;2.0.0: not found"). So it seems not to be available via maven or spark-packages. How can I accesss this package? Or am I doing something wrong/missing? Thank y

Re: spark streaming kafka connector questions

2016-09-16 Thread 毅程
s for time > 1473491465000 ms (execution: 0.066 s) *(EVENT 2nd time process cost 0.066)* > > and the 2nd time processing of the event finished without really doing the > work. > > Help is hugely appreciated. > > > > -- > View this message in context: http://apache-spark-

Re: Why is Spark getting Kafka data out from port 2181 ?

2016-09-10 Thread Cody Koeninger
Are you using the receiver based stream? On Sep 10, 2016 15:45, "Eric Ho" wrote: > I notice that some Spark programs would contact something like 'zoo1:2181' > when trying to suck data out of Kafka. > > Does the kafka data actually transported out over this port ? > >

Why is Spark getting Kafka data out from port 2181 ?

2016-09-10 Thread Eric Ho
I notice that some Spark programs would contact something like 'zoo1:2181' when trying to suck data out of Kafka. Does the kafka data actually transported out over this port ? Typically Zookeepers use 2218 for SSL. If my Spark program were to use 2218, how would I specify zookeeper specific

Re: spark streaming kafka connector questions

2016-09-10 Thread Cody Koeninger
message in context: http://apache-spark-user-list. 1001560.n3.nabble.com/spark-streaming-kafka-connector- questions-tp27681p27687.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-

Re: spark streaming kafka connector questions

2016-09-10 Thread Cheng Yi
hed without really doing the work. Help is hugely appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-connector-questions-tp27681p27687.html Sent from the Apache Spark User List mailing list archive at Nabble.

Re: spark streaming kafka connector questions

2016-09-10 Thread 毅程
e > > https://github.com/koeninger/kafka-exactly-once > > On Thu, Sep 8, 2016 at 4:44 PM, Cheng Yi <phillipchen...@gmail.com> wrote: > > I am using the lastest streaming kafka connector > > org.apache.spark > > spark-streaming-kafka_2.11 > > 1.6.2 > &g

Re: spark streaming kafka connector questions

2016-09-08 Thread Cody Koeninger
to whether the batch processed successfully If you're unclear on how the kafka integration works, see https://github.com/koeninger/kafka-exactly-once On Thu, Sep 8, 2016 at 4:44 PM, Cheng Yi <phillipchen...@gmail.com> wrote: > I am using the lastest streaming kafka connector > org.apache.s

spark streaming kafka connector questions

2016-09-08 Thread Cheng Yi
I am using the lastest streaming kafka connector org.apache.spark spark-streaming-kafka_2.11 1.6.2 I am facing the problem that a message is delivered two times to my consumers. these two deliveries are 10+ seconds apart, it looks this is caused by my lengthy message processing (took about 60

Re: Spark to Kafka communication encrypted ?

2016-08-31 Thread Luciano Resende
bm.com/profiles/html/myProfileView.do> 8200 >> Warden Ave >> Markham, ON L6G 1C7 >> Canada >> >> >> >> - Original message - >> From: Cody Koeninger <c...@koeninger.org> >> To: Eric Ho <e...@analyticsmd.com> >> Cc: "u

Re: Spark to Kafka communication encrypted ?

2016-08-31 Thread Cody Koeninger
; Warden Ave > Markham, ON L6G 1C7 > Canada > > > > - Original message - > From: Cody Koeninger <c...@koeninger.org> > To: Eric Ho <e...@analyticsmd.com> > Cc: "user@spark.apache.org" <user@spark.apache.org> > Subject: Re: Spark to

Re: Spark to Kafka communication encrypted ?

2016-08-31 Thread Mihai Iacob
: Cody Koeninger <c...@koeninger.org>To: Eric Ho <e...@analyticsmd.com>Cc: "user@spark.apache.org" <user@spark.apache.org>Subject: Re: Spark to Kafka communication encrypted ?Date: Wed, Aug 31, 2016 10:34 AM  Encryption is only available in spark-streaming-kafka-0-10, not 0-

Re: Spark to Kafka communication encrypted ?

2016-08-31 Thread Cody Koeninger
Encryption is only available in spark-streaming-kafka-0-10, not 0-8. You enable it the same way you enable it for the Kafka project's new consumer, by setting kafka configuration parameters appropriately. http://kafka.apache.org/documentation.html#security_ssl On Wed, Aug 31, 2016 at 2:03 AM

Spark to Kafka communication encrypted ?

2016-08-31 Thread Eric Ho
I can't find in Spark 1.6.2's docs in how to turn encryption on for Spark to Kafka communication ... I think that the Spark docs only tells you how to turn on encryption for inter Spark node communications .. Am I wrong ? Thanks. -- -eric ho

Re: Zero Data Loss in Spark with Kafka

2016-08-23 Thread Cody Koeninger
See https://github.com/koeninger/kafka-exactly-once On Aug 23, 2016 10:30 AM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> wrote: > Hi Experts, > > I am looking for some information on how to acheive zero data loss while > working with kafka and Spark. I have se

Re: Zero Data Loss in Spark with Kafka

2016-08-23 Thread Sudhir Babu Pothineni
com> wrote: > Hi Experts, > > I am looking for some information on how to acheive zero data loss while > working with kafka and Spark. I have searched online and blogs have > different answer. Please let me know if anyone has idea on this. > > Blog 1: > https://databrick

Zero Data Loss in Spark with Kafka

2016-08-23 Thread KhajaAsmath Mohammed
Hi Experts, I am looking for some information on how to acheive zero data loss while working with kafka and Spark. I have searched online and blogs have different answer. Please let me know if anyone has idea on this. Blog 1: https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
GMT+08:00 Cody Koeninger <c...@koeninger.org>: > For 2.0, the kafka dstream support is in two separate subprojects > depending on which version of Kafka you are using > > spark-streaming-kafka-0-10 > or > spark-streaming-kafka-0-8 > > corresponding to brokers

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Cody Koeninger
For 2.0, the kafka dstream support is in two separate subprojects depending on which version of Kafka you are using spark-streaming-kafka-0-10 or spark-streaming-kafka-0-8 corresponding to brokers that are version 0.10+ or 0.8+ On Mon, Jul 25, 2016 at 12:29 PM, Reynold Xin <r...@databricks.

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Andy Davidson
support. We also use kafka Andy From: Marco Mistroni <mmistr...@gmail.com> Date: Monday, July 25, 2016 at 2:33 AM To: kevin <kiss.kevin...@gmail.com> Cc: "user @spark" <user@spark.apache.org>, "dev.spark" <d...@spark.apache.org> Subject: Re: where I can f

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Reynold Xin
iss.kevin...@gmail.com>: > >> hi,all : >> I try to run example org.apache.spark.examples.streaming.KafkaWordCount , >> I got error : >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/streaming/kafka/KafkaUtils$ >> at >> org.apache.spark.examples.

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Marco Mistroni
rk.examples.streaming.KafkaWordCount , >> I got error : >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/streaming/kafka/KafkaUtils$ >> at >> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
I have compile it from source code 2016-07-25 12:05 GMT+08:00 kevin <kiss.kevin...@gmail.com>: > hi,all : > I try to run example org.apache.spark.examples.streaming.KafkaWordCount , > I got error : > Exception in thread "main" java.lang.NoClassDefFoundError: > o

where I can find spark-streaming-kafka for spark2.0

2016-07-24 Thread kevin
hi,all : I try to run example org.apache.spark.examples.streaming.KafkaWordCount , I got error : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$ at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun

Re: Spark streaming Kafka Direct API + Multiple consumers

2016-07-07 Thread Rabin Banerjee
mapping between Kafka and RDD partitions, which is easier to understand and tune. On Jul 7, 2016 3:04 PM, "SamyaMaiti" <samya.maiti2...@gmail.com> wrote: > Hi Team, > > Is there a way we can consume from Kafka using spark Streaming direct API > using multiple consumers

Spark streaming Kafka Direct API + Multiple consumers

2016-07-07 Thread SamyaMaiti
Hi Team, Is there a way we can consume from Kafka using spark Streaming direct API using multiple consumers (belonging to same consumer group) Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
you want to do this kind of thing, you will need to maintain your > >> own index from time to offset. > >> > >> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote: > >> > Hi all, > >> > > >> > Is there

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger
ent proposal for it but it has gotten pushed back to at least >> 0.10.1 >> >> If you want to do this kind of thing, you will need to maintain your >> own index from time to offset. >> >> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote: &

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
m> wrote: > > Hi all, > > > > Is there any way to re-compute using Spark Streaming - Kafka Direct > Approach > > from specific time? > > > > In some cases, I want to re-compute again from specific time (e.g > beginning > > of day)? is that possible? > > > > > > > > -- > > Thanks > > Kien >

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger
ent...@gmail.com> wrote: > Hi all, > > Is there any way to re-compute using Spark Streaming - Kafka Direct Approach > from specific time? > > In some cases, I want to re-compute again from specific time (e.g beginning > of day)? is that possible?

Re: Spark Streaming - Kafka - java.nio.BufferUnderflowException

2016-05-25 Thread Cody Koeninger
low error while trying to consume message from Kafka > through Spark streaming (Kafka direct API). This used to work OK when using > Spark standalone cluster manager. We're just switching to using Cloudera 5.7 > using Yarn to manage Spark cluster and started to see the below error. > >

Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
Hi all, Is there any way to re-compute using Spark Streaming - Kafka Direct Approach from specific time? In some cases, I want to re-compute again from specific time (e.g beginning of day)? is that possible? -- Thanks Kien

Spark Streaming - Kafka - java.nio.BufferUnderflowException

2016-05-25 Thread Scott W
I'm running into below error while trying to consume message from Kafka through Spark streaming (Kafka direct API). This used to work OK when using Spark standalone cluster manager. We're just switching to using Cloudera 5.7 using Yarn to manage Spark cluster and started to see the below error

Spark 1.4.1 + Kafka 0.8.2 with Kerberos

2016-05-13 Thread Mail.com
Hi All, I am trying to get spark 1.4.1 (Java) work with Kafka 0.8.2 in Kerberos enabled cluster. HDP 2.3.2 Is there any document I can refer to. Thanks, Pradeep - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: Spark and Kafka direct approach problem

2016-05-04 Thread Mich Talebzadeh
he version of Kafka. >> >> On 4 May 2016 at 16:18, Luca Ferrari <ferrari.l...@live.it> wrote: >> >>> Hi, >>> >>> I’m new on Apache Spark and I’m trying to run the Spark Streaming + >>> Kafka Integration Direct Approach example (JavaDirectKa

Re: Spark and Kafka direct approach problem

2016-05-04 Thread Shixiong(Ryan) Zhu
Check the version of the libraries you downloaded, the version of spark, > the version of Kafka. > > On 4 May 2016 at 16:18, Luca Ferrari <ferrari.l...@live.it> wrote: > >> Hi, >> >> I’m new on Apache Spark and I’m trying to run the Spark Streaming + >> Kafka Integr

Re: Spark and Kafka direct approach problem

2016-05-04 Thread أنس الليثي
and I’m trying to run the Spark Streaming + Kafka > Integration Direct Approach example (JavaDirectKafkaWordCount.java). > > I’ve downloaded all the libraries but when I try to run I get this error > > Exception in thread "main" java.lang.NoSuchMethodError: > scala.Prede

Spark and Kafka direct approach problem

2016-05-04 Thread Luca Ferrari
Hi, I’m new on Apache Spark and I’m trying to run the Spark Streaming + Kafka Integration Direct Approach example (JavaDirectKafkaWordCount.java). I’ve downloaded all the libraries but when I try to run I get this error Exception in thread "main" java.lang.NoSuchMethodError: sc

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin
in a new data >> center isn't really different from starting up a post-crash job in the >> original data center. >> >> On Tue, Apr 19, 2016 at 3:32 AM, Erwan ALLAIN <eallain.po...@gmail.com >> <mailto:eallain.po...@gmail.com>> wrote: >> Thanks

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin
;> Thanks Jason and Cody. I'll try to explain a bit better the Multi DC case. >> >> As I mentionned before, I'm planning to use one kafka cluster and 2 or more >> spark cluster distinct. >> >> Let's say we have the following DCs configuration in a nominal case.

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN
ata >>> center isn't really different from starting up a post-crash job in the >>> original data center. >>> >>> On Tue, Apr 19, 2016 at 3:32 AM, Erwan ALLAIN <eallain.po...@gmail.com> >>> wrote: >>> >>>> Thanks Jason and

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Cody Koeninger
t;> On Tue, Apr 19, 2016 at 3:32 AM, Erwan ALLAIN <eallain.po...@gmail.com> >> wrote: >> >>> Thanks Jason and Cody. I'll try to explain a bit better the Multi DC >>> case. >>> >>> As I mentionned before, I'm planning to use one kafka cluster and

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN
spark cluster distinct. >> >> Let's say we have the following DCs configuration in a nominal case. >> Kafka partitions are consumed uniformly by the 2 datacenters. >> >> DataCenter Spark Kafka Consumer Group Kafka partition (P1 to P4) >> DC 1 Master 1.1 >

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin
n and Cody. I'll try to explain a bit better the Multi DC case. > > As I mentionned before, I'm planning to use one kafka cluster and 2 or more > spark cluster distinct. > > Let's say we have the following DCs configuration in a nominal case. > Kafka partitions are consumed u

  1   2   3   >