Re: functools.partial as UserDefinedFunction

2015-03-26 Thread Karlson

Hi,

I've filed a JIRA (https://issues.apache.org/jira/browse/SPARK-6553) and 
suggested a fix (https://github.com/apache/spark/pull/5206).



On 2015-03-25 19:49, Davies Liu wrote:

It’s good to support functools.partial, could you file a JIRA for it?


On Wednesday, March 25, 2015 at 5:42 AM, Karlson wrote:



Hi all,

passing a functools.partial-function as a UserDefinedFunction to
DataFrame.select raises an AttributeException, because 
functools.partial

does not have the attribute __name__. Is there any alternative to
relying on __name__ in pyspark/sql/functions.py:126 ?


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
(mailto:dev-unsubscr...@spark.apache.org)
For additional commands, e-mail: dev-h...@spark.apache.org 
(mailto:dev-h...@spark.apache.org)





-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread John Canny
I mentioned this earlier in the thread, but I'll put it out again. Dense 
BLAS are not very important for most machine learning workloads: at 
least for non-image workloads in industry (and for image processing you 
would probably want a deep learning/SGD solution with convolution 
kernels). e.g. it was only relevant for 1/7 of our recent benchmarks, 
which should be a reasonable sample. What really matters is sparse BLAS 
performance. BIDMat is still an order of magnitude faster there. Those 
kernels are only in BIDMat, since NVIDIAs sparse BLAS dont perform well 
on power-law data.


Its also the case that the overall performance of an algorithm is 
determined by the slowest kernel, not the fastest. If the goal is to get 
closer to BIDMach's performance on typical problems, you need to make 
sure that every kernel goes at comparable speed. So the real question is 
how much faster MLLib routines do on a complete problem with/without GPU 
acceleration. For BIDMach, its close to a factor of 10. But that 
required running entirely on the GPU, and making sure every kernel is 
close to its limit.


-John

If you think nvblas would be helpful, you should try it in some 
end-to-end benchmarks.

On 3/25/15, 6:23 PM, Evan R. Sparks wrote:
Yeah, much more reasonable - nice to know that we can get full GPU 
performance from breeze/netlib-java - meaning there's no compelling 
performance reason to switch out our current linear algebra library 
(at least as far as this benchmark is concerned).


Instead, it looks like a user guide for configuring Spark/MLlib to use 
the right BLAS library will get us most of the way there. Or, would it 
make sense to finally ship openblas compiled for some common platforms 
(64-bit linux, windows, mac) directly with Spark - hopefully 
eliminating the jblas warnings once and for all for most users? 
(Licensing is BSD) Or am I missing something?


On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander 
alexander.ula...@hp.com mailto:alexander.ula...@hp.com wrote:


As everyone suggested, the results were too good to be true, so I
double-checked them. It turns that nvblas did not do
multiplication due to parameter NVBLAS_TILE_DIM from nvblas.conf
and returned zero matrix. My previously posted results with nvblas
are matrices copying only. The default NVBLAS_TILE_DIM==2048 is
too big for my graphic card/matrix size. I handpicked other values
that worked. As a result, netlib+nvblas is on par with
BIDMat-cuda. As promised, I am going to post a how-to for nvblas
configuration.


https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing



-Original Message-
From: Ulanov, Alexander
Sent: Wednesday, March 25, 2015 2:31 PM
To: Sam Halliday
Cc: dev@spark.apache.org mailto:dev@spark.apache.org; Xiangrui
Meng; Joseph Bradley; Evan R. Sparks; jfcanny
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi again,

I finally managed to use nvblas within Spark+netlib-java. It has
exceptional performance for big matrices with Double, faster than
BIDMat-cuda with Float. But for smaller matrices, if you will copy
them to/from GPU, OpenBlas or MKL might be a better choice. This
correlates with original nvblas presentation on GPU conf 2013
(slide 21):

http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf

My results:

https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

Just in case, these tests are not for generalization of
performance of different libraries. I just want to pick a library
that does at best dense matrices multiplication for my task.

P.S. My previous issue with nvblas was the following: it has
Fortran blas functions, at the same time netlib-java uses C cblas
functions. So, one needs cblas shared library to use nvblas
through netlib-java. Fedora does not have cblas (but Debian and
Ubuntu have), so I needed to compile it. I could not use cblas
from Atlas or Openblas because they link to their implementation
and not to Fortran blas.

Best regards, Alexander

-Original Message-
From: Ulanov, Alexander
Sent: Tuesday, March 24, 2015 6:57 PM
To: Sam Halliday
Cc: dev@spark.apache.org mailto:dev@spark.apache.org; Xiangrui
Meng; Joseph Bradley; Evan R. Sparks
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi,

I am trying to use nvblas with netlib-java from Spark. nvblas
functions should replace current blas functions calls after
executing LD_PRELOAD as suggested in
http://docs.nvidia.com/cuda/nvblas/#Usage without any changes to
netlib-java. It seems to work for simple Java example, but I
cannot make it work with Spark. I run the following:
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64

Re: Storing large data for MLlib machine learning

2015-03-26 Thread Evan R. Sparks
On binary file formats - I looked at HDF5+Spark a couple of years ago and
found it barely JVM-friendly and very Hadoop-unfriendly (e.g. the APIs
needed filenames as input, you couldn't pass it anything like an
InputStream). I don't know if it has gotten any better.

Parquet plays much more nicely and there are lots of spark-related projects
using it already. Keep in mind that it's column-oriented which might impact
performance - but basically you're going to want your features in a byte
array and deser should be pretty straightforward.

On Thu, Mar 26, 2015 at 2:26 PM, Stephen Boesch java...@gmail.com wrote:

 There are some convenience methods you might consider including:

MLUtils.loadLibSVMFile

 and   MLUtils.loadLabeledPoint

 2015-03-26 14:16 GMT-07:00 Ulanov, Alexander alexander.ula...@hp.com:

  Hi,
 
  Could you suggest what would be the reasonable file format to store
  feature vector data for machine learning in Spark MLlib? Are there any
 best
  practices for Spark?
 
  My data is dense feature vectors with labels. Some of the requirements
 are
  that the format should be easy loaded/serialized, randomly accessible,
 with
  a small footprint (binary). I am considering Parquet, hdf5, protocol
 buffer
  (protobuf), but I have little to no experience with them, so any
  suggestions would be really appreciated.
 
  Best regards, Alexander
 



Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread Sam Halliday
I'm not at all surprised ;-) I fully expect the GPU performance to get
better automatically as the hardware improves.

Netlib natives still need to be shipped separately. I'd also oppose any
move to make Open BLAS the default - is not always better and I think
natives really need DevOps buy-in. It's not the right solution for
everybody.
On 26 Mar 2015 01:23, Evan R. Sparks evan.spa...@gmail.com wrote:

 Yeah, much more reasonable - nice to know that we can get full GPU
 performance from breeze/netlib-java - meaning there's no compelling
 performance reason to switch out our current linear algebra library (at
 least as far as this benchmark is concerned).

 Instead, it looks like a user guide for configuring Spark/MLlib to use the
 right BLAS library will get us most of the way there. Or, would it make
 sense to finally ship openblas compiled for some common platforms (64-bit
 linux, windows, mac) directly with Spark - hopefully eliminating the jblas
 warnings once and for all for most users? (Licensing is BSD) Or am I
 missing something?

 On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander 
 alexander.ula...@hp.com wrote:

 As everyone suggested, the results were too good to be true, so I
 double-checked them. It turns that nvblas did not do multiplication due to
 parameter NVBLAS_TILE_DIM from nvblas.conf and returned zero matrix. My
 previously posted results with nvblas are matrices copying only. The
 default NVBLAS_TILE_DIM==2048 is too big for my graphic card/matrix size. I
 handpicked other values that worked. As a result, netlib+nvblas is on par
 with BIDMat-cuda. As promised, I am going to post a how-to for nvblas
 configuration.


 https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing



 -Original Message-
 From: Ulanov, Alexander
 Sent: Wednesday, March 25, 2015 2:31 PM
 To: Sam Halliday
 Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks;
 jfcanny
 Subject: RE: Using CUDA within Spark / boosting linear algebra

 Hi again,

 I finally managed to use nvblas within Spark+netlib-java. It has
 exceptional performance for big matrices with Double, faster than
 BIDMat-cuda with Float. But for smaller matrices, if you will copy them
 to/from GPU, OpenBlas or MKL might be a better choice. This correlates with
 original nvblas presentation on GPU conf 2013 (slide 21):
 http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf

 My results:

 https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

 Just in case, these tests are not for generalization of performance of
 different libraries. I just want to pick a library that does at best dense
 matrices multiplication for my task.

 P.S. My previous issue with nvblas was the following: it has Fortran blas
 functions, at the same time netlib-java uses C cblas functions. So, one
 needs cblas shared library to use nvblas through netlib-java. Fedora does
 not have cblas (but Debian and Ubuntu have), so I needed to compile it. I
 could not use cblas from Atlas or Openblas because they link to their
 implementation and not to Fortran blas.

 Best regards, Alexander

 -Original Message-
 From: Ulanov, Alexander
 Sent: Tuesday, March 24, 2015 6:57 PM
 To: Sam Halliday
 Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks
 Subject: RE: Using CUDA within Spark / boosting linear algebra

 Hi,

 I am trying to use nvblas with netlib-java from Spark. nvblas functions
 should replace current blas functions calls after executing LD_PRELOAD as
 suggested in http://docs.nvidia.com/cuda/nvblas/#Usage without any
 changes to netlib-java. It seems to work for simple Java example, but I
 cannot make it work with Spark. I run the following:
 export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64
 env LD_PRELOAD=/usr/local/cuda-6.5/lib64/libnvblas.so ./spark-shell
 --driver-memory 4G In nvidia-smi I observe that Java is to use GPU:

 +-+
 | Processes:   GPU
 Memory |
 |  GPU   PID  Type  Process name   Usage
 |

 |=|
 |0  8873C   bash
 39MiB |
 |0  8910C   /usr/lib/jvm/java-1.7.0/bin/java
 39MiB |

 +-+

 In Spark shell I do matrix multiplication and see the following:
 15/03/25 06:48:01 INFO JniLoader: successfully loaded
 /tmp/jniloader8192964377009965483netlib-native_system-linux-x86_64.so
 So I am sure that netlib-native is loaded and cblas supposedly used.
 However, matrix multiplication does executes on CPU since I see 16% of CPU
 used and 0% of GPU used. I also checked different matrix sizes, from
 100x100 to 12000x12000

 Could you 

Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread Sam Halliday
Btw, OpenBLAS requires GPL runtime binaries which are typically considered
system libraries (and these fall under something similar to the Java
classpath exception rule)... so it's basically impossible to distribute
OpenBLAS the way you're suggesting, sorry. Indeed, there is work ongoing in
Spark right now to clear up something of this nature.

On a more technical level, I'd recommend watching my talk at ScalaX which
explains in detail why high performance only comes from machine optimised
binaries, which requires DevOps buy-in (and, I'd recommend using MKL anyway
on the CPU, not OpenBLAS).

On an even deeper level, using natives has consequences to JIT and GC which
isn't suitable for everybody and we'd really like people to go into that
with their eyes wide open.
On 26 Mar 2015 07:43, Sam Halliday sam.halli...@gmail.com wrote:

 I'm not at all surprised ;-) I fully expect the GPU performance to get
 better automatically as the hardware improves.

 Netlib natives still need to be shipped separately. I'd also oppose any
 move to make Open BLAS the default - is not always better and I think
 natives really need DevOps buy-in. It's not the right solution for
 everybody.
 On 26 Mar 2015 01:23, Evan R. Sparks evan.spa...@gmail.com wrote:

 Yeah, much more reasonable - nice to know that we can get full GPU
 performance from breeze/netlib-java - meaning there's no compelling
 performance reason to switch out our current linear algebra library (at
 least as far as this benchmark is concerned).

 Instead, it looks like a user guide for configuring Spark/MLlib to use
 the right BLAS library will get us most of the way there. Or, would it make
 sense to finally ship openblas compiled for some common platforms (64-bit
 linux, windows, mac) directly with Spark - hopefully eliminating the jblas
 warnings once and for all for most users? (Licensing is BSD) Or am I
 missing something?

 On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander 
 alexander.ula...@hp.com wrote:

 As everyone suggested, the results were too good to be true, so I
 double-checked them. It turns that nvblas did not do multiplication due to
 parameter NVBLAS_TILE_DIM from nvblas.conf and returned zero matrix. My
 previously posted results with nvblas are matrices copying only. The
 default NVBLAS_TILE_DIM==2048 is too big for my graphic card/matrix size. I
 handpicked other values that worked. As a result, netlib+nvblas is on par
 with BIDMat-cuda. As promised, I am going to post a how-to for nvblas
 configuration.


 https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing



 -Original Message-
 From: Ulanov, Alexander
 Sent: Wednesday, March 25, 2015 2:31 PM
 To: Sam Halliday
 Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R.
 Sparks; jfcanny
 Subject: RE: Using CUDA within Spark / boosting linear algebra

 Hi again,

 I finally managed to use nvblas within Spark+netlib-java. It has
 exceptional performance for big matrices with Double, faster than
 BIDMat-cuda with Float. But for smaller matrices, if you will copy them
 to/from GPU, OpenBlas or MKL might be a better choice. This correlates with
 original nvblas presentation on GPU conf 2013 (slide 21):
 http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf

 My results:

 https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

 Just in case, these tests are not for generalization of performance of
 different libraries. I just want to pick a library that does at best dense
 matrices multiplication for my task.

 P.S. My previous issue with nvblas was the following: it has Fortran
 blas functions, at the same time netlib-java uses C cblas functions. So,
 one needs cblas shared library to use nvblas through netlib-java. Fedora
 does not have cblas (but Debian and Ubuntu have), so I needed to compile
 it. I could not use cblas from Atlas or Openblas because they link to their
 implementation and not to Fortran blas.

 Best regards, Alexander

 -Original Message-
 From: Ulanov, Alexander
 Sent: Tuesday, March 24, 2015 6:57 PM
 To: Sam Halliday
 Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks
 Subject: RE: Using CUDA within Spark / boosting linear algebra

 Hi,

 I am trying to use nvblas with netlib-java from Spark. nvblas functions
 should replace current blas functions calls after executing LD_PRELOAD as
 suggested in http://docs.nvidia.com/cuda/nvblas/#Usage without any
 changes to netlib-java. It seems to work for simple Java example, but I
 cannot make it work with Spark. I run the following:
 export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64
 env LD_PRELOAD=/usr/local/cuda-6.5/lib64/libnvblas.so ./spark-shell
 --driver-memory 4G In nvidia-smi I observe that Java is to use GPU:

 +-+
 | Processes: