xit ? It'll get
> retried on another executor, but as long as that one fails the same
> way...
>
> If you can identify the error case at the time you're doing database
> interaction and just prevent data being written then, that's what I
> typically do.
>
> On
Hello guys,
I have a question regarding how spark handle failure.
I’m using kafka direct stream
Spark 2.0.2
Kafka 0.10.0.1
Here is a snippet of code
val stream = createDirectStream(….)
stream
.map(…)
.forEachRDD( doSomething)
stream
.map(…)
.forEachRDD( doSomethingElse)
The execution is in
Hello,
In my project, I would like to use logback as logging framework ( faster,
memory footprint, etc ...)
I have managed to make it work however I had to modify the spark jars folder
- remove slf4j-log4jxx.jar
- add logback-classic / logback-core.jar
And add logback.xml in conf folder.
Is it th
Hi everyone,
I d like to know what kind of configuration mechanism is used in general ?
Below is what I m going to implement but I d like to know if there is any
"standard way"
1) put configuration in hdfs
2) specify extrajavaoptions (driver and worker) with the hdfs url (
hdfs://ip:port/config)
ems from the map)
>
>
>
> On Fri, Oct 21, 2016 at 3:32 AM, Erwan ALLAIN
> wrote:
> > Hi,
> >
> > I'm currently implementing an exactly once mechanism based on the
> following
> > example:
> >
> > https://github.com/koeninger/kafka-exactly
Hi,
I'm currently implementing an exactly once mechanism based on the following
example:
https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerBatch.scala
the pseudo code is as follow:
dstream.transform (store offset in a variable on driver side )
ds
Hi
I'm working with
- Kafka 0.8.2
- Spark Streaming (2.0) direct input stream.
- cassandra 3.0
My batch interval is 1s.
When I use some map, filter even saveToCassandra functions, the processing
time is around 50ms on empty batches
=> This is fine.
As soon as I use some reduceByKey, the proces
ut splitbrain problem, etc). I also don't know how
> well ZK will work cross-datacenter.
>
> As far as the spark side of things goes, if it's idempotent, why not just
> run both instances all the time.
>
>
>
> On Tue, Apr 19, 2016 at 10:21 AM, Erwan ALLAIN
> wr
hat happens to Kafka and your downstream
> data store when DC2 crashes.
>
> From a Spark point of view, starting up a post-crash job in a new data
> center isn't really different from starting up a post-crash job in the
> original data center.
>
> On Tue, Apr 19, 2016 at 3:3
you changed
> partitions.
>
> https://issues.apache.org/jira/browse/SPARK-12177 has the ongoing work
> towards using the kafka 0.10 consumer, which would allow for dynamic
> topicparittions
>
> Regarding your multi-DC questions, I'm not really clear on what you're
&g
Hello,
I'm currently designing a solution where 2 distinct clusters Spark (2
datacenters) share the same Kafka (Kafka rack aware or manual broker
repartition).
The aims are
- preventing DC crash: using kafka resiliency and consumer group mechanism
(or else ?)
- keeping consistent offset among repl
You may need to persist r1 after partitionBy call. second join will be more
efficient.
On Mon, Nov 16, 2015 at 2:48 PM, Rishi Mishra wrote:
> AFAIK and can see in the code both of them should behave same.
>
> On Sat, Nov 14, 2015 at 2:10 AM, Alexander Pivovarov > wrote:
>
>> Hi Everyone
>>
>> I
Have a look at this: https://github.com/koeninger/kafka-exactly-once
especially:
https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerBatch.scala
https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerPartiti
ady looked into it and also at 'Try' from which I
> got inspired. Thanks for pointing it out anyway!
>
> #A.M.
>
> Il giorno 15 ott 2015, alle ore 16:19, Erwan ALLAIN <
> eallain.po...@gmail.com> ha scritto:
>
> What about http://www.scala-lang.org/api/2
What about http://www.scala-lang.org/api/2.9.3/scala/Either.html ?
On Thu, Oct 15, 2015 at 2:57 PM, Roberto Congiu
wrote:
> I came to a similar solution to a similar problem. I deal with a lot of
> CSV files from many different sources and they are often malformed.
> HOwever, I just have succes
ackwards
>>> > compatibility, but if enough people ask for it... ?
>>> >
>>> > On Tue, Oct 6, 2015 at 6:51 AM, Jonathan Coveney
>>> wrote:
>>> >>
>>> >> You can put a class in the org.apache.spark namespace to access
>>
have to do is almost the same as the KafkaCluster
which is private.
is it possible to :
- add another signature in KafkaUtils ?
- make KafkaCluster public ?
or do you have any other srmart solution where I don't need to copy/paste
KafkaCluster ?
Thanks.
Regards,
Erwan ALLAIN
17 matches
Mail list logo