Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-23 Thread Guillaume Pitel
Hi, So I've done this Node-centered accumulator, I've written a small piece about it : http://blog.guillaume-pitel.fr/2015/06/spark-trick-shot-node-centered-aggregator/ Hope it can help someone Guillaume 2015-06-18 15:17 GMT+02:00 Guillaume Pitel guillaume.pi...@exensa.com

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
Guillaume Pitel guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com: Hi, I'm trying to figure out the smartest way to implement a global count-min-sketch on accumulators. For now, we are doing that with RDDs. It works well, but with one sketch per partition, merging takes

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
, then that doesn't work anymore, probably you could detect it and recompute the sketches, but it would become over complicated. 2015-06-18 14:27 GMT+02:00 Guillaume Pitel guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com: Hi, Thank you for this confirmation. Coalescing

Re: Best way to randomly distribute elements

2015-06-18 Thread Guillaume Pitel
. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163

Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
is initialized locally, updated, then sent back to the driver for merging ? So I guess, accumulators may not be the way to go, finally. Any advice ? Guillaume -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge

Re: Random pairs / RDD order

2015-04-16 Thread Guillaume Pitel
don't think it should hurt too much. Guillaume Hi everyone, However I am not happy with this solution because each element is most likely to be paired with elements that are closeby in the partition. This is because sample returns an ordered Iterator. -- eXenSa *Guillaume PITEL

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-14 Thread Guillaume Pitel
comments: // SPARK_LOCAL_DIRS environment variable, and deleted by the Worker when the // application finishes. On 13.04.2015, at 11:26, Guillaume Pitel guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com wrote: Does it also cleanup spark local dirs ? I thought it was only cleaning

Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Guillaume Pitel
a SIGTERM signal, so perhaps the daemon was terminated by someone or a parent process. Just my guess. Tim On Mon, Apr 13, 2015 at 2:28 AM, Guillaume Pitel guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com wrote: Very likely to be this : http://www.linuxdevcenter.com/pub/a/linux

Re: Spark Cluster: RECEIVED SIGNAL 15: SIGTERM

2015-04-13 Thread Guillaume Pitel
Worker: RECEIVED SIGNAL 15: SIGTERM* -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-13 Thread Guillaume Pitel
the disk spacein this folder once the shuffle operation is done? If not, I need to write a job to clean it up myself. But how do I know which sub folders there can be removed? Ningjun -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-11 Thread Guillaume Pitel
-- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Join on Spark too slow.

2015-04-09 Thread Guillaume Pitel
*Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Pairwise computations within partition

2015-04-09 Thread Guillaume Pitel
*Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: Incremently load big RDD file into Memory

2015-04-08 Thread Guillaume Pitel
here. I would be really grateful to you if you reply it. Thanks, On Wed, Apr 8, 2015 at 1:23 PM, Guillaume Pitel guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com wrote: This kind of operation is not scalable, not matter what you do, at least if you _really_ want to do

Re: Mllib native netlib-java/OpenBLAS

2014-12-10 Thread Guillaume Pitel
the source, by downloading it and running: mvn -Pnetlib-lgpl -DskipTests clean package -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Maven profile in MLLib netlib-lgpl not working (1.1.1)

2014-12-10 Thread Guillaume Pitel
For additional commands, e-mail: user-h...@spark.apache.org mailto:user-h...@spark.apache.org -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit

2014-10-20 Thread Guillaume Pitel
*Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705

Re: What does KryoException: java.lang.NegativeArraySizeException mean?

2014-10-20 Thread Guillaume Pitel
sure that your combineByKey has enough different keys, and see what happens. Guillaume Thank you, Guillaume, my dataset is not that large, it's totally ~2GB 2014-10-20 16:58 GMT+08:00 Guillaume Pitel guillaume.pi...@exensa.com mailto:guillaume.pi...@exensa.com: Hi, It happened to me

Re: Delayed hotspot optimizations in Spark

2014-10-10 Thread Guillaume Pitel
Hi Could it be due to GC ? I read it may happen if your program starts with a small heap. What are your -Xms and -Xmx values ? Print GC stats with -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps Guillaume Hello spark users and developers! I am using hdfs + spark sql + hive schema +

Problem with very slow behaviour of TorrentBroadcast vs. HttpBroadcast

2014-10-01 Thread Guillaume Pitel
a configuration error from our side, but are unable to pin it down. Does someone have any idea of the origin of the problem ? For now we're sticking with the HttpBroadcast workaround. Guillaume -- eXenSa *Guillaume PITEL, Président* +33(0)626 222 431 eXenSa S.A.S. http://www.exensa.com/ 41, rue

Re: Kyro deserialisation error

2014-07-24 Thread Guillaume Pitel
 _o4lbʂԛ4각 4^x4ڻ Clearly a stream corruption problem. We've been running fine (afaik) on 1.0.0 for two weeks, switch to 1.0.1 this Monday, and since, this kind of problem randomly occur. Guillaume Pitel Not sure if this helps, but it does seem to be part of a name

Re: Huge matrix

2014-04-14 Thread Guillaume Pitel
-- Guillaume PITEL, Prsident +33(0)6 25 48 86 80 eXenSa S.A.S

Re: Huge matrix

2014-04-12 Thread Guillaume Pitel
. -- Guillaume PITEL, Prsident +33(0)6 25 48 86 80 / +33(0)9 70 44 67 53 eXenSa S.A.S

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Guillaume Pitel (eXenSa)
Maybe with MEMORY_ONLY, spark has to recompute the RDD several times because they don't fit in memory. It makes things run slower. As a general safe rule, use MEMORY_AND_DISK_SER Guillaume Pitel - Président d'eXenSa Prashant Sharma scrapco...@gmail.com a écrit : I think Mahout uses

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Guillaume Pitel
! -- Guillaume PITEL, Prsident +33(0)6 25 48 86 80 eXenSa

Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Guillaume Pitel
? in SPARK_JAVA_OPTS during SparkContext creation ? It should probably be passed in the spark-env.sh because it can differ on each node Guillaume On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel guillaume.pi...@exensa.com wrote