and process the
same DStream at different speed (low processing vs high)?
Is it easily possible to share values (map for example) between pipelines
without using an external database? I think accumulator/broadcast could
work but between two pipelines I'm not sure.
Regards,
Julien Naour
keys corresponding to some kind of user id. I want
to process last events by each user id once ie skip intermediate events by
user id.
I have only one Kafka topic with all theses events.
Regards,
Julien Naour
Le mer. 6 janv. 2016 à 16:13, Cody Koeninger <c...@koeninger.org> a écrit :
>
s, you can't just magically ignore some time
> range of rdds, because they may contain events you care about.
>
> On Wed, Jan 6, 2016 at 10:55 AM, Julien Naour <julna...@gmail.com> wrote:
>
>> The following lines are my understanding of Spark Streaming AFAIK, I
>>
t; Then you can do foreachPartition with a local map to store just a single
> event per user, e.g.
>
> foreachPartition { p =>
> val m = new HashMap
> p.foreach ( event =>
> m.put(event,user, event)
> }
> m.foreach {
>... do your computation
> }
&g
/1427
And current k-means implementation of MLlib, it's benefited from sparse
vector computing.
http://spark-summit.org/2014/talk/sparse-data-support-in-mllib-2
2014-08-21 15:40 GMT+08:00 Julien Naour julna...@gmail.com:
My Arrays are in fact Array[Array[Long]] and like 17x15 (17
instead of
simple variable?
Cheers,
Julien Naour
You can find in the following presentation a simple example of a clustering
model use to classify new incoming tweet :
https://www.youtube.com/watch?v=sPhyePwo7FA
Regards,
Julien
2014-08-05 7:08 GMT+02:00 Xiangrui Meng men...@gmail.com:
Some extra work is needed to close the loop. One related