Dear all,
I would appreciate if anyone could explain when does mapWithState terminate,
i.e. apply subsequent transformations such as writing the state to an external
sink?
Given a KafkaConsumer instance pulling messages from a Kafka topic, and a
mapWithState transformation updating the state
Hi,
currently I am exploring Spark’s fault tolerance capabilities in terms of fault
recovery. Namely I run a Spark 2.1 standalone cluster on a master and four
worker nodes. The application pulls data using the Kafka direct stream API from
a Kafka topic over a (sliding) window of time, and
Hi all,
I’m running cluster consisting of a master and four slaves. The cluster runs a
Spark application that reads data from a Kafka topic over a window of time, and
writes the data back to Kafka. Checkpointing is enabled by using HDFS. However,
although Spark periodically commits checkpoints
mance.
>
> On Thu, Apr 27, 2017 at 11:33 AM, Dominik Safaric
> <dominiksafa...@gmail.com> wrote:
>> Indeed I have. But, even when storing the offsets in Spark and committing
>> offsets upon completion of an output operation within the foreachRDD call
>> (as pointed
docs/latest/streaming-kafka-0-10-integration.html#kafka-itself
>
> On Wed, Apr 26, 2017 at 1:17 PM, Dominik Safaric
> <dominiksafa...@gmail.com> wrote:
>> The reason why I want to obtain this information, i.e. <partition, offset,
>> timestamp> tuples is to rela
ntegration.html#obtaining-offsets
>
> Timestamp isn't really a meaningful idea for a range of offsets.
>
>
> On Tue, Apr 25, 2017 at 2:43 PM, Dominik Safaric
> <dominiksafa...@gmail.com> wrote:
>> Hi all,
>>
>> Because the Spark Streaming direct Kafka consumer maps
Hi all,
Because the Spark Streaming direct Kafka consumer maps offsets for a given
Kafka topic and a partition internally while having enable.auto.commit set to
false, how can I retrieve the offset of each made consumer’s poll call using
the offset ranges of an RDD? More precisely, the
489765610
> 1489765611
> 1489765612
> 1489765613
> Window:
> 1489765610
> 1489765611
> 1489765612
> 1489765613
> 1489765614
> Window:
> 1489765611
> 1489765612
> 1489765613
> 1489765614
> 1489765615
>
> On Thu, Mar 16, 2017 at 2:34 PM, Dominik Safaric
Window:
> 1489765608
> 1489765609
> 1489765610
> 1489765611
> 1489765612
> Window:
> 1489765609
> 1489765610
> 1489765611
> 1489765612
> 1489765613
> Window:
> 1489765610
> 1489765611
> 1489765612
> 1489765613
> 1489765614
> Window:
&g
Hi all,
As I’ve implemented a streaming application pulling data from Kafka every 1
second (batch interval), I am observing some quite strange behaviour (didn’t
use Spark extensively in the past, but continuous operator based engines
instead of).
Namely the dstream.window(Seconds(60))
> Spark classes that interfere.
>
>
> On Wed, Mar 1, 2017, 14:20 Dominik Safaric <dominiksafa...@gmail.com
> <mailto:dominiksafa...@gmail.com>> wrote:
> I've been trying to submit a Spark Streaming application using spark-submit
> to a cluster of mine consisting of a
I've been trying to submit a Spark Streaming application using spark-submit to
a cluster of mine consisting of a master and two worker nodes. The application
has been written in Scala, and build using Maven. Importantly, the Maven build
is configured to produce a fat JAR containing all
Hi,
As I am investigate among others onto the fault recovery capabilities of Spark,
I’ve been curious - what source code artifact initiates the parallel recovery
process? In addition, how is a faulty node detected (from a driver's point of
view)?
Thanks in advance,
Dominik
A few months ago, I've started investigating part of an empirical research
several stream processing engines, including but not limited to Spark
Streaming.
As the benchmark should extend the scope further from performance metrics
such as throughput and latency, I've focused onto fault tolerance
roker the host.name it should
advertise to the consumers and producers.
By setting this property, I instantly started receiving Kafka log messages.
Nevertheless, thank you all for your help, I appreciate it!
> On 07 Jun 2016, at 17:44, Dominik Safaric <dominiksafa...@gmail.com> wrote:
you dealing with offsets ?
>
> Can you verify the offsets on the broker:
>
> kafka-run-class.sh kafka.tools.GetOffsetShell --topic --broker-list
> --time -1
>
> -Todd
>
> On Tue, Jun 7, 2016 at 8:17 AM, Dominik Safaric <dominiksafa...@gmail.com
> <mailto:d
h 0.9.0.1 due to changes in the kafka clients between 0.8.2.2
> and 0.9.0.x. See this for more information:
>
> https://issues.apache.org/jira/browse/SPARK-12177
> <https://issues.apache.org/jira/browse/SPARK-12177>
>
> -Todd
>
> On Tue, Jun 7, 2016 at 7:35 AM,
ila.pl> wrote:
>
> Hi,
>
> What's the version of Spark? You're using Kafka 0.9.0.1, ain't you? What's
> the topic name?
>
> Jacek
>
> On 7 Jun 2016 11:06 a.m., "Dominik Safaric" <dominiksafa...@gmail.com
> <mailto:dominiksafa...@gmail.com&
ww.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 7 June 2016 at 11:32, Dominik Safa
countByValueAndWindow = price.filter(_ >
> 95.0).countByValueAndWindow(Seconds(windowLength), Seconds(slidingInterval))
> countByValueAndWindow.print()
> //
> ssc.start()
> ssc.awaitTermination()
>
> HTH
>
> Dr Mich Talebzadeh
>
> LinkedIn
> https://www.l
AcPCCdOABUrV8Pw
>
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>
>
> On 7 June 2016 at 10:06, Dominik Safaric <dominiksafa...@gmail
As I am trying to integrate Kafka into Spark, the following exception occurs:
org.apache.spark.SparkException: java.nio.channels.ClosedChannelException
org.apache.spark.SparkException: Couldn't find leader offsets for
Set([**,0])
at
Dear all,
Lately, as a part of a scientific research, I've been developing an
application that streams (or at least should) data from Travis CI and
GitHub, using their REST API's. The purpose of this is to get insight into
the commit-build relationship, in order to further perform numerous
Recently, I've implemented the following Receiver and custom Spark Streaming
InputDStream using Scala:
/**
* The GitHubUtils object declares an interface consisting of overloaded
createStream
* functions. The createStream function takes as arguments the ctx :
StreamingContext
* passed by the
24 matches
Mail list logo