Re: Slowness in Kmeans calculating fastSquaredDistance

2016-02-09 Thread Li Ming Tsai
Hi,


It looks like Kmeans++ is slow 
(SPARK-3424<https://issues.apache.org/jira/browse/SPARK-3424>) in the 
initialisation phase and is local to driver using 1 core only.


If I use random, the job completed in 1.5mins compared to 1hr+.


Should I move this to the dev list?


Regards,

Liming



From: Li Ming Tsai <mailingl...@ltsai.com>
Sent: Sunday, February 7, 2016 10:03 AM
To: user@spark.apache.org
Subject: Re: Slowness in Kmeans calculating fastSquaredDistance


Hi,


I did more investigation and found out that BLAS.scala is calling the native 
reference architecture (f2jblas) for level 1 routines.


I even patched it to use nativeBlas.ddot but it has no material impact.


https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala#L126


private def dot(x: DenseVector, y: DenseVector): Double = {

val n = x.size

f2jBLAS.ddot(n, x.values, 1, y.values, 1)

  }


Maybe Xiangrui can comment on this?




From: Li Ming Tsai <mailingl...@ltsai.com>
Sent: Friday, February 5, 2016 10:56 AM
To: user@spark.apache.org
Subject: Slowness in Kmeans calculating fastSquaredDistance


Hi,


I'm using INTEL MKL on Spark 1.6.0 which I built myself with the -Pnetlib-lgpl 
flag.


I am using spark local[4] mode and I run it like this:
# export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64
# bin/spark-shell ...

I have also added the following to /opt/intel/mkl/lib/intel64:
lrwxrwxrwx 1 root root12 Feb  1 09:18 libblas.so -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 libblas.so.3 -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 liblapack.so -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 liblapack.so.3 -> libmkl_rt.so


I believe (???) that I'm using Intel MKL because the warnings went away:

16/02/01 07:49:38 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS

16/02/01 07:49:38 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS

After collectAsMap, there is no progress but I can observe that only 1 CPU is 
being utilised with the following stack trace:

"ForkJoinPool-3-worker-7" #130 daemon prio=5 os_prio=0 tid=0x7fbf30ab6000 
nid=0xbdc runnable [0x7fbf12205000]

   java.lang.Thread.State: RUNNABLE

at com.github.fommil.netlib.F2jBLAS.ddot(F2jBLAS.java:71)

at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:128)

at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:111)

at 
org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:349)

at 
org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:587)

at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:561)

at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:555)


This last few steps takes more than half of the total time for a 1Mx100 dataset.


The code is just:

val clusters = KMeans.train(parsedData, 1000, 1)


Shouldn't it utilising all the cores for the dot product? Is this a 
misconfiguration?


Thanks!




Re: Slowness in Kmeans calculating fastSquaredDistance

2016-02-06 Thread Li Ming Tsai
Hi,


I did more investigation and found out that BLAS.scala is calling the native 
reference architecture (f2jblas) for level 1 routines.


I even patched it to use nativeBlas.ddot but it has no material impact.


https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala#L126


private def dot(x: DenseVector, y: DenseVector): Double = {

val n = x.size

f2jBLAS.ddot(n, x.values, 1, y.values, 1)

  }


Maybe Xiangrui can comment on this?




From: Li Ming Tsai <mailingl...@ltsai.com>
Sent: Friday, February 5, 2016 10:56 AM
To: user@spark.apache.org
Subject: Slowness in Kmeans calculating fastSquaredDistance


Hi,


I'm using INTEL MKL on Spark 1.6.0 which I built myself with the -Pnetlib-lgpl 
flag.


I am using spark local[4] mode and I run it like this:
# export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64
# bin/spark-shell ...

I have also added the following to /opt/intel/mkl/lib/intel64:
lrwxrwxrwx 1 root root12 Feb  1 09:18 libblas.so -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 libblas.so.3 -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 liblapack.so -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 liblapack.so.3 -> libmkl_rt.so


I believe (???) that I'm using Intel MKL because the warnings went away:

16/02/01 07:49:38 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS

16/02/01 07:49:38 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS

After collectAsMap, there is no progress but I can observe that only 1 CPU is 
being utilised with the following stack trace:

"ForkJoinPool-3-worker-7" #130 daemon prio=5 os_prio=0 tid=0x7fbf30ab6000 
nid=0xbdc runnable [0x7fbf12205000]

   java.lang.Thread.State: RUNNABLE

at com.github.fommil.netlib.F2jBLAS.ddot(F2jBLAS.java:71)

at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:128)

at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:111)

at 
org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:349)

at 
org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:587)

at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:561)

at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:555)


This last few steps takes more than half of the total time for a 1Mx100 dataset.


The code is just:

val clusters = KMeans.train(parsedData, 1000, 1)


Shouldn't it utilising all the cores for the dot product? Is this a 
misconfiguration?


Thanks!




Slowness in Kmeans calculating fastSquaredDistance

2016-02-04 Thread Li Ming Tsai
Hi,


I'm using INTEL MKL on Spark 1.6.0 which I built myself with the -Pnetlib-lgpl 
flag.


I am using spark local[4] mode and I run it like this:
# export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64
# bin/spark-shell ...

I have also added the following to /opt/intel/mkl/lib/intel64:
lrwxrwxrwx 1 root root12 Feb  1 09:18 libblas.so -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 libblas.so.3 -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 liblapack.so -> libmkl_rt.so
lrwxrwxrwx 1 root root12 Feb  1 09:18 liblapack.so.3 -> libmkl_rt.so


I believe (???) that I'm using Intel MKL because the warnings went away:

16/02/01 07:49:38 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS

16/02/01 07:49:38 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS

After collectAsMap, there is no progress but I can observe that only 1 CPU is 
being utilised with the following stack trace:

"ForkJoinPool-3-worker-7" #130 daemon prio=5 os_prio=0 tid=0x7fbf30ab6000 
nid=0xbdc runnable [0x7fbf12205000]

   java.lang.Thread.State: RUNNABLE

at com.github.fommil.netlib.F2jBLAS.ddot(F2jBLAS.java:71)

at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:128)

at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:111)

at 
org.apache.spark.mllib.util.MLUtils$.fastSquaredDistance(MLUtils.scala:349)

at 
org.apache.spark.mllib.clustering.KMeans$.fastSquaredDistance(KMeans.scala:587)

at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:561)

at 
org.apache.spark.mllib.clustering.KMeans$$anonfun$findClosest$1.apply(KMeans.scala:555)


This last few steps takes more than half of the total time for a 1Mx100 dataset.


The code is just:

val clusters = KMeans.train(parsedData, 1000, 1)


Shouldn't it utilising all the cores for the dot product? Is this a 
misconfiguration?


Thanks!