RE: Spark and Kafka integration

2017-01-12 Thread Phadnis, Varun
Cool! Thanks for your inputs Jacek and Mark!

From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: 13 January 2017 12:59
To: Phadnis, Varun 
Cc: user@spark.apache.org
Subject: Re: Spark and Kafka integration

See "API compatibility" in http://spark.apache.org/versioning-policy.html

While code that is annotated as Experimental is still a good faith effort to 
provide a stable and useful API, the fact is that we're not yet confident 
enough that we've got the public API in exactly the form that we want to commit 
to maintaining until at least the next major release.  That means that the API 
may change in the next minor/feature-level release (but it shouldn't in a 
patch/bugfix-level release), which would require that your source code be 
rewritten to use the new API.  In the most extreme case, we may decide that the 
experimental code didn't work out the way we wanted, so it could be withdrawn 
entirely.  Complete withdrawal of the Kafka code is unlikely, but it may well 
change in incompatible way with future releases even before Spark 3.0.0.

On Thu, Jan 12, 2017 at 5:57 AM, Phadnis, Varun 
mailto:phad...@sky.optymyze.com>> wrote:
Hello,

We are using  Spark 2.0 with Kafka 0.10.

As I understand, much of the API packaged in the following dependency we are 
targeting is marked as “@Experimental”


org.apache.spark
spark-streaming-kafka-0-10_2.11
2.0.0


What are implications of this being marked as experimental? Are they stable 
enough for production?

Thanks,
Varun




Re: Spark and Kafka integration

2017-01-12 Thread Mark Hamstra
See "API compatibility" in http://spark.apache.org/versioning-policy.html

While code that is annotated as Experimental is still a good faith effort
to provide a stable and useful API, the fact is that we're not yet
confident enough that we've got the public API in exactly the form that we
want to commit to maintaining until at least the next major release.  That
means that the API may change in the next minor/feature-level release (but
it shouldn't in a patch/bugfix-level release), which would require that
your source code be rewritten to use the new API.  In the most extreme
case, we may decide that the experimental code didn't work out the way we
wanted, so it could be withdrawn entirely.  Complete withdrawal of the
Kafka code is unlikely, but it may well change in incompatible way with
future releases even before Spark 3.0.0.

On Thu, Jan 12, 2017 at 5:57 AM, Phadnis, Varun 
wrote:

> Hello,
>
>
>
> We are using  Spark 2.0 with Kafka 0.10.
>
>
>
> As I understand, much of the API packaged in the following dependency we
> are targeting is marked as “@Experimental”
>
>
>
> 
>
> org.apache.spark
>
> spark-streaming-kafka-0-10_2.11
>
> 2.0.0
>
> 
>
>
>
> What are implications of this being marked as experimental? Are they
> stable enough for production?
>
>
>
> Thanks,
>
> Varun
>
>
>


Re: Spark and Kafka integration

2017-01-12 Thread Jacek Laskowski
Hi Phadnis,

I found this in
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html:

> This version of the integration is marked as experimental, so the API is 
> potentially subject to change.

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jan 12, 2017 at 2:57 PM, Phadnis, Varun
 wrote:
> Hello,
>
>
>
> We are using  Spark 2.0 with Kafka 0.10.
>
>
>
> As I understand, much of the API packaged in the following dependency we are
> targeting is marked as “@Experimental”
>
>
>
> 
>
> org.apache.spark
>
> spark-streaming-kafka-0-10_2.11
>
> 2.0.0
>
> 
>
>
>
> What are implications of this being marked as experimental? Are they stable
> enough for production?
>
>
>
> Thanks,
>
> Varun
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark and Kafka integration

2017-01-12 Thread Phadnis, Varun
Hello,

We are using  Spark 2.0 with Kafka 0.10.

As I understand, much of the API packaged in the following dependency we are 
targeting is marked as "@Experimental"


org.apache.spark
spark-streaming-kafka-0-10_2.11
2.0.0


What are implications of this being marked as experimental? Are they stable 
enough for production?

Thanks,
Varun



Re: Spark and Kafka direct approach problem

2016-05-04 Thread Mich Talebzadeh
This works

spark 1.61, using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM,
Java 1.8.0_77)

Kafka version 0.9.0.1  using scala-library-2.11.7.jar

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 4 May 2016 at 19:52, Shixiong(Ryan) Zhu  wrote:

> It's because the Scala version of Spark and the Scala version of Kafka
> don't match. Please check them.
>
> On Wed, May 4, 2016 at 6:17 AM, أنس الليثي  wrote:
>
>> NoSuchMethodError usually appears because of a difference in the library
>> versions.
>>
>> Check the version of the libraries you downloaded, the version of spark,
>> the version of Kafka.
>>
>> On 4 May 2016 at 16:18, Luca Ferrari  wrote:
>>
>>> Hi,
>>>
>>> I’m new on Apache Spark and I’m trying to run the Spark Streaming +
>>> Kafka Integration Direct Approach example (JavaDirectKafkaWordCount.java
>>> ).
>>>
>>> I’ve downloaded all the libraries but when I try to run I get this error
>>>
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
>>>
>>> at kafka.api.RequestKeys$.(RequestKeys.scala:48)
>>>
>>> at kafka.api.RequestKeys$.(RequestKeys.scala)
>>>
>>> at kafka.api.TopicMetadataRequest.(TopicMetadataRequest.scala:55)
>>>
>>> at
>>> org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:122)
>>>
>>> at
>>> org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
>>>
>>> at
>>> org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
>>>
>>> at
>>> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
>>>
>>> at
>>> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
>>>
>>> at
>>> org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
>>> at it.unimi.di.luca.SimpleApp.main(SimpleApp.java:53)
>>>
>>> Any suggestions?
>>>
>>> Cheers
>>> Luca
>>>
>>>
>>
>>
>>
>> --
>> Anas Rabei
>> Senior Software Developer
>> Mubasher.info
>> anas.ra...@mubasher.info
>>
>
>


Re: Spark and Kafka direct approach problem

2016-05-04 Thread Shixiong(Ryan) Zhu
It's because the Scala version of Spark and the Scala version of Kafka
don't match. Please check them.

On Wed, May 4, 2016 at 6:17 AM, أنس الليثي  wrote:

> NoSuchMethodError usually appears because of a difference in the library
> versions.
>
> Check the version of the libraries you downloaded, the version of spark,
> the version of Kafka.
>
> On 4 May 2016 at 16:18, Luca Ferrari  wrote:
>
>> Hi,
>>
>> I’m new on Apache Spark and I’m trying to run the Spark Streaming +
>> Kafka Integration Direct Approach example (JavaDirectKafkaWordCount.java
>> ).
>>
>> I’ve downloaded all the libraries but when I try to run I get this error
>>
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
>>
>> at kafka.api.RequestKeys$.(RequestKeys.scala:48)
>>
>> at kafka.api.RequestKeys$.(RequestKeys.scala)
>>
>> at kafka.api.TopicMetadataRequest.(TopicMetadataRequest.scala:55)
>>
>> at
>> org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:122)
>>
>> at
>> org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
>>
>> at
>> org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
>>
>> at
>> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
>>
>> at
>> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
>>
>> at
>> org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
>> at it.unimi.di.luca.SimpleApp.main(SimpleApp.java:53)
>>
>> Any suggestions?
>>
>> Cheers
>> Luca
>>
>>
>
>
>
> --
> Anas Rabei
> Senior Software Developer
> Mubasher.info
> anas.ra...@mubasher.info
>


Re: Spark and Kafka direct approach problem

2016-05-04 Thread أنس الليثي
NoSuchMethodError usually appears because of a difference in the library
versions.

Check the version of the libraries you downloaded, the version of spark,
the version of Kafka.

On 4 May 2016 at 16:18, Luca Ferrari  wrote:

> Hi,
>
> I’m new on Apache Spark and I’m trying to run the Spark Streaming + Kafka
> Integration Direct Approach example (JavaDirectKafkaWordCount.java).
>
> I’ve downloaded all the libraries but when I try to run I get this error
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
>
> at kafka.api.RequestKeys$.(RequestKeys.scala:48)
>
> at kafka.api.RequestKeys$.(RequestKeys.scala)
>
> at kafka.api.TopicMetadataRequest.(TopicMetadataRequest.scala:55)
>
> at
> org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:122)
>
> at
> org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
>
> at
> org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
> at it.unimi.di.luca.SimpleApp.main(SimpleApp.java:53)
>
> Any suggestions?
>
> Cheers
> Luca
>
>



-- 
Anas Rabei
Senior Software Developer
Mubasher.info
anas.ra...@mubasher.info


Spark and Kafka direct approach problem

2016-05-04 Thread Luca Ferrari
Hi,

I’m new on Apache Spark and I’m trying to run the Spark Streaming + Kafka 
Integration Direct Approach example (JavaDirectKafkaWordCount.java).

I’ve downloaded all the libraries but when I try to run I get this error


Exception in thread "main" java.lang.NoSuchMethodError: 
scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;

at kafka.api.RequestKeys$.(RequestKeys.scala:48)

at kafka.api.RequestKeys$.(RequestKeys.scala)

at kafka.api.TopicMetadataRequest.(TopicMetadataRequest.scala:55)

at 
org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:122)

at 
org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)

at 
org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)

at 
org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)

at 
org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)

at 
org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)

at it.unimi.di.luca.SimpleApp.main(SimpleApp.java:53)

Any suggestions?

Cheers
Luca
 



RE: Spark and Kafka Integration

2015-12-07 Thread Singh, Abhijeet
For Q2. The order of the logs in each partition is guaranteed but there cannot 
be any such thing as global order.

From: Prashant Bhardwaj [mailto:prashant2006s...@gmail.com]
Sent: Monday, December 07, 2015 5:46 PM
To: user@spark.apache.org
Subject: Spark and Kafka Integration

Hi

Some Background:
We have a Kafka cluster with ~45 topics. Some of topics contains logs in Json 
format and some in PSV(pipe separated value) format. Now I want to consume 
these logs using Spark streaming and store them in Parquet format in HDFS.

Now my question is:
1. Can we create a InputDStream per topic in the same application?

 Since for every topic Schema of logs might differ, so want to process some 
topics in different way.
I want to store logs in different output directory based on the topic name.

2. Also how to partition logs based on timestamp?

--
Regards
Prashant


Spark and Kafka Integration

2015-12-07 Thread Prashant Bhardwaj
Hi

Some Background:
We have a Kafka cluster with ~45 topics. Some of topics contains logs in
Json format and some in PSV(pipe separated value) format. Now I want to
consume these logs using Spark streaming and store them in Parquet format
in HDFS.

Now my question is:
1. Can we create a InputDStream per topic in the same application?

 Since for every topic Schema of logs might differ, so want to process some
topics in different way.
I want to store logs in different output directory based on the topic name.

2. Also how to partition logs based on timestamp?

-- 
Regards
Prashant


Re: why spark and kafka always crash

2015-09-14 Thread Akhil Das
Can you be more precise?

Thanks
Best Regards

On Tue, Sep 15, 2015 at 11:28 AM, Joanne Contact 
wrote:

> How to prevent it?
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


why spark and kafka always crash

2015-09-14 Thread Joanne Contact
How to prevent it?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
This is my window:

reduceByKeyAndWindow(
   new Function2() {
@Override
 public Integer call(Integer i1, Integer i2) { return i1 + i2; }
   },
   new Function2() {
 public Integer call(Integer i1, Integer i2) { return i1 - i2; }
   },
   new Duration(60 * 5 * 1000),
   new Duration(1 * 1000)
 );

> On Nov 6, 2014, at 18:37, Gwen Shapira  wrote:
> 
> What's the window size? If the window is around 10 seconds and you are
> sending data at very stable rate, this is expected.
> 
> 
> 
> On Thu, Nov 6, 2014 at 9:32 AM, Eduardo Costa Alfaia > wrote:
> 
>> Hi Guys,
>> 
>> I am doing some tests with Spark Streaming and Kafka, but I have seen
>> something strange, I have modified the JavaKafkaWordCount to use
>> ReducebyKeyandWindow and to print in the screen the accumulated numbers of
>> the words, in the beginning spark works very well in each interaction the
>> numbers of the words increase but after 12 a 13 sec the results repeats
>> continually.
>> 
>> My program producer remain sending the words toward the kafka.
>> 
>> Does anyone have any idea about this?
>> 
>> 
>> ---
>> Time: 1415272266000 ms
>> ---
>> (accompanied
>> them,6)
>> (merrier,5)
>> (it
>> possessed,5)
>> (the
>> treacherous,5)
>> (Quite,12)
>> (offer,273)
>> (rabble,58)
>> (exchanging,16)
>> (Genoa,18)
>> (merchant,41)
>> ...
>> ---
>> Time: 1415272267000 ms
>> ---
>> (accompanied
>> them,12)
>> (merrier,12)
>> (it
>> possessed,12)
>> (the
>> treacherous,11)
>> (Quite,24)
>> (offer,602)
>> (rabble,132)
>> (exchanging,35)
>> (Genoa,36)
>> (merchant,84)
>> ...
>> ---
>> Time: 1415272268000 ms
>> ---
>> (accompanied
>> them,17)
>> (merrier,18)
>> (it
>> possessed,17)
>> (the
>> treacherous,17)
>> (Quite,35)
>> (offer,889)
>> (rabble,192)
>> (the
>> bed,1)
>> (exchanging,51)
>> (Genoa,54)
>> ...
>> ---
>> Time: 1415272269000 ms
>> ---
>> (accompanied
>> them,17)
>> (merrier,18)
>> (it
>> possessed,17)
>> (the
>> treacherous,17)
>> (Quite,35)
>> (offer,889)
>> (rabble,192)
>> (the
>> bed,1)
>> (exchanging,51)
>> (Genoa,54)
>> ...
>> 
>> ---
>> Time: 141527227 ms
>> ---
>> (accompanied
>> them,17)
>> (merrier,18)
>> (it
>> possessed,17)
>> (the
>> treacherous,17)
>> (Quite,35)
>> (offer,889)
>> (rabble,192)
>> (the
>> bed,1)
>> (exchanging,51)
>> (Genoa,54)
>> ...
>> 
>> 
>> --
>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>> 


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155


Spark and Kafka

2014-11-06 Thread Eduardo Costa Alfaia
Hi Guys,

I am doing some tests with Spark Streaming and Kafka, but I have seen something 
strange, I have modified the JavaKafkaWordCount to use ReducebyKeyandWindow and 
to print in the screen the accumulated numbers of the words, in the beginning 
spark works very well in each interaction the numbers of the words increase but 
after 12 a 13 sec the results repeats continually. 

My program producer remain sending the words toward the kafka.

Does anyone have any idea about this?


---
Time: 1415272266000 ms
---
(accompanied
them,6)
(merrier,5)
(it
possessed,5)
(the
treacherous,5)
(Quite,12)
(offer,273)
(rabble,58)
(exchanging,16)
(Genoa,18)
(merchant,41)
...
---
Time: 1415272267000 ms
---
(accompanied
them,12)
(merrier,12)
(it
possessed,12)
(the
treacherous,11)
(Quite,24)
(offer,602)
(rabble,132)
(exchanging,35)
(Genoa,36)
(merchant,84)
...
---
Time: 1415272268000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...
---
Time: 1415272269000 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...

---
Time: 141527227 ms
---
(accompanied
them,17)
(merrier,18)
(it
possessed,17)
(the
treacherous,17)
(Quite,35)
(offer,889)
(rabble,192)
(the
bed,1)
(exchanging,51)
(Genoa,54)
...


-- 
Informativa sulla Privacy: http://www.unibs.it/node/8155