date:20150326

Re: functools.partial as UserDefinedFunction

2015-03-26 Thread Karlson


Hi,

I've filed a JIRA (https://issues.apache.org/jira/browse/SPARK-6553) and 
suggested a fix (https://github.com/apache/spark/pull/5206).



On 2015-03-25 19:49, Davies Liu wrote:

It’s good to support functools.partial, could you file a JIRA for it?


On Wednesday, March 25, 2015 at 5:42 AM, Karlson wrote:



Hi all,

passing a functools.partial-function as a UserDefinedFunction to
DataFrame.select raises an AttributeException, because 
functools.partial

does not have the attribute __name__. Is there any alternative to
relying on __name__ in pyspark/sql/functions.py:126 ?


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
(mailto:dev-unsubscr...@spark.apache.org)
For additional commands, e-mail: dev-h...@spark.apache.org 
(mailto:dev-h...@spark.apache.org)





-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread John Canny

I mentioned this earlier in the thread, but I'll put it out again. Dense 
BLAS are not very important for most machine learning workloads: at 
least for non-image workloads in industry (and for image processing you 
would probably want a deep learning/SGD solution with convolution 
kernels). e.g. it was only relevant for 1/7 of our recent benchmarks, 
which should be a reasonable sample. What really matters is sparse BLAS 
performance. BIDMat is still an order of magnitude faster there. Those 
kernels are only in BIDMat, since NVIDIAs sparse BLAS dont perform well 
on power-law data.


Its also the case that the overall performance of an algorithm is 
determined by the slowest kernel, not the fastest. If the goal is to get 
closer to BIDMach's performance on typical problems, you need to make 
sure that every kernel goes at comparable speed. So the real question is 
how much faster MLLib routines do on a complete problem with/without GPU 
acceleration. For BIDMach, its close to a factor of 10. But that 
required running entirely on the GPU, and making sure every kernel is 
close to its limit.


-John

If you think nvblas would be helpful, you should try it in some 
end-to-end benchmarks.

On 3/25/15, 6:23 PM, Evan R. Sparks wrote:
Yeah, much more reasonable - nice to know that we can get full GPU 
performance from breeze/netlib-java - meaning there's no compelling 
performance reason to switch out our current linear algebra library 
(at least as far as this benchmark is concerned).


Instead, it looks like a user guide for configuring Spark/MLlib to use 
the right BLAS library will get us most of the way there. Or, would it 
make sense to finally ship openblas compiled for some common platforms 
(64-bit linux, windows, mac) directly with Spark - hopefully 
eliminating the jblas warnings once and for all for most users? 
(Licensing is BSD) Or am I missing something?


On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander 
alexander.ula...@hp.com mailto:alexander.ula...@hp.com wrote:


As everyone suggested, the results were too good to be true, so I
double-checked them. It turns that nvblas did not do
multiplication due to parameter NVBLAS_TILE_DIM from nvblas.conf
and returned zero matrix. My previously posted results with nvblas
are matrices copying only. The default NVBLAS_TILE_DIM==2048 is
too big for my graphic card/matrix size. I handpicked other values
that worked. As a result, netlib+nvblas is on par with
BIDMat-cuda. As promised, I am going to post a how-to for nvblas
configuration.


https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing



-Original Message-
From: Ulanov, Alexander
Sent: Wednesday, March 25, 2015 2:31 PM
To: Sam Halliday
Cc: dev@spark.apache.org mailto:dev@spark.apache.org; Xiangrui
Meng; Joseph Bradley; Evan R. Sparks; jfcanny
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi again,

I finally managed to use nvblas within Spark+netlib-java. It has
exceptional performance for big matrices with Double, faster than
BIDMat-cuda with Float. But for smaller matrices, if you will copy
them to/from GPU, OpenBlas or MKL might be a better choice. This
correlates with original nvblas presentation on GPU conf 2013
(slide 21):

http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf

My results:

https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

Just in case, these tests are not for generalization of
performance of different libraries. I just want to pick a library
that does at best dense matrices multiplication for my task.

P.S. My previous issue with nvblas was the following: it has
Fortran blas functions, at the same time netlib-java uses C cblas
functions. So, one needs cblas shared library to use nvblas
through netlib-java. Fedora does not have cblas (but Debian and
Ubuntu have), so I needed to compile it. I could not use cblas
from Atlas or Openblas because they link to their implementation
and not to Fortran blas.

Best regards, Alexander

-Original Message-
From: Ulanov, Alexander
Sent: Tuesday, March 24, 2015 6:57 PM
To: Sam Halliday
Cc: dev@spark.apache.org mailto:dev@spark.apache.org; Xiangrui
Meng; Joseph Bradley; Evan R. Sparks
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi,

I am trying to use nvblas with netlib-java from Spark. nvblas
functions should replace current blas functions calls after
executing LD_PRELOAD as suggested in
http://docs.nvidia.com/cuda/nvblas/#Usage without any changes to
netlib-java. It seems to work for simple Java example, but I
cannot make it work with Spark. I run the following:
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64

Re: Storing large data for MLlib machine learning

2015-03-26 Thread Evan R. Sparks

On binary file formats - I looked at HDF5+Spark a couple of years ago and
found it barely JVM-friendly and very Hadoop-unfriendly (e.g. the APIs
needed filenames as input, you couldn't pass it anything like an
InputStream). I don't know if it has gotten any better.

Parquet plays much more nicely and there are lots of spark-related projects
using it already. Keep in mind that it's column-oriented which might impact
performance - but basically you're going to want your features in a byte
array and deser should be pretty straightforward.

On Thu, Mar 26, 2015 at 2:26 PM, Stephen Boesch java...@gmail.com wrote:

 There are some convenience methods you might consider including:

MLUtils.loadLibSVMFile

 and   MLUtils.loadLabeledPoint

 2015-03-26 14:16 GMT-07:00 Ulanov, Alexander alexander.ula...@hp.com:

  Hi,
 
  Could you suggest what would be the reasonable file format to store
  feature vector data for machine learning in Spark MLlib? Are there any
 best
  practices for Spark?
 
  My data is dense feature vectors with labels. Some of the requirements
 are
  that the format should be easy loaded/serialized, randomly accessible,
 with
  a small footprint (binary). I am considering Parquet, hdf5, protocol
 buffer
  (protobuf), but I have little to no experience with them, so any
  suggestions would be really appreciated.
 
  Best regards, Alexander

Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread Sam Halliday

I'm not at all surprised ;-) I fully expect the GPU performance to get
better automatically as the hardware improves.

Netlib natives still need to be shipped separately. I'd also oppose any
move to make Open BLAS the default - is not always better and I think
natives really need DevOps buy-in. It's not the right solution for
everybody.
On 26 Mar 2015 01:23, Evan R. Sparks evan.spa...@gmail.com wrote:

Yeah, much more reasonable - nice to know that we can get full GPU
performance from breeze/netlib-java - meaning there's no compelling
performance reason to switch out our current linear algebra library (at
least as far as this benchmark is concerned).

Instead, it looks like a user guide for configuring Spark/MLlib to use the
right BLAS library will get us most of the way there. Or, would it make
sense to finally ship openblas compiled for some common platforms (64-bit
linux, windows, mac) directly with Spark - hopefully eliminating the jblas
warnings once and for all for most users? (Licensing is BSD) Or am I
missing something?

On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander
alexander.ula...@hp.com wrote:

As everyone suggested, the results were too good to be true, so I
double-checked them. It turns that nvblas did not do multiplication due to
parameter NVBLAS_TILE_DIM from nvblas.conf and returned zero matrix. My
previously posted results with nvblas are matrices copying only. The
default NVBLAS_TILE_DIM==2048 is too big for my graphic card/matrix size. I
handpicked other values that worked. As a result, netlib+nvblas is on par
with BIDMat-cuda. As promised, I am going to post a how-to for nvblas
configuration.

https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

-Original Message-
From: Ulanov, Alexander
Sent: Wednesday, March 25, 2015 2:31 PM
To: Sam Halliday
Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks;
jfcanny
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi again,

I finally managed to use nvblas within Spark+netlib-java. It has
exceptional performance for big matrices with Double, faster than
BIDMat-cuda with Float. But for smaller matrices, if you will copy them
to/from GPU, OpenBlas or MKL might be a better choice. This correlates with
original nvblas presentation on GPU conf 2013 (slide 21):
http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf

My results:

https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

Just in case, these tests are not for generalization of performance of
different libraries. I just want to pick a library that does at best dense
matrices multiplication for my task.

P.S. My previous issue with nvblas was the following: it has Fortran blas
functions, at the same time netlib-java uses C cblas functions. So, one
needs cblas shared library to use nvblas through netlib-java. Fedora does
not have cblas (but Debian and Ubuntu have), so I needed to compile it. I
could not use cblas from Atlas or Openblas because they link to their
implementation and not to Fortran blas.

Best regards, Alexander

-Original Message-
From: Ulanov, Alexander
Sent: Tuesday, March 24, 2015 6:57 PM
To: Sam Halliday
Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi,

I am trying to use nvblas with netlib-java from Spark. nvblas functions
should replace current blas functions calls after executing LD_PRELOAD as
suggested in http://docs.nvidia.com/cuda/nvblas/#Usage without any
changes to netlib-java. It seems to work for simple Java example, but I
cannot make it work with Spark. I run the following:
export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64
env LD_PRELOAD=/usr/local/cuda-6.5/lib64/libnvblas.so ./spark-shell
--driver-memory 4G In nvidia-smi I observe that Java is to use GPU:

+-+
| Processes: GPU
Memory |
| GPU PID Type Process name Usage
|

+-+

In Spark shell I do matrix multiplication and see the following:
15/03/25 06:48:01 INFO JniLoader: successfully loaded
/tmp/jniloader8192964377009965483netlib-native_system-linux-x86_64.so
So I am sure that netlib-native is loaded and cblas supposedly used.
However, matrix multiplication does executes on CPU since I see 16% of CPU
used and 0% of GPU used. I also checked different matrix sizes, from
100x100 to 12000x12000

Could you

Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread Sam Halliday

Btw, OpenBLAS requires GPL runtime binaries which are typically considered
system libraries (and these fall under something similar to the Java
classpath exception rule)... so it's basically impossible to distribute
OpenBLAS the way you're suggesting, sorry. Indeed, there is work ongoing in
Spark right now to clear up something of this nature.

On a more technical level, I'd recommend watching my talk at ScalaX which
explains in detail why high performance only comes from machine optimised
binaries, which requires DevOps buy-in (and, I'd recommend using MKL anyway
on the CPU, not OpenBLAS).

On an even deeper level, using natives has consequences to JIT and GC which
isn't suitable for everybody and we'd really like people to go into that
with their eyes wide open.
On 26 Mar 2015 07:43, Sam Halliday sam.halli...@gmail.com wrote:

I'm not at all surprised ;-) I fully expect the GPU performance to get
better automatically as the hardware improves.

Instead, it looks like a user guide for configuring Spark/MLlib to use
the right BLAS library will get us most of the way there. Or, would it make
sense to finally ship openblas compiled for some common platforms (64-bit
linux, windows, mac) directly with Spark - hopefully eliminating the jblas
warnings once and for all for most users? (Licensing is BSD) Or am I
missing something?

On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander
alexander.ula...@hp.com wrote:

https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

-Original Message-
From: Ulanov, Alexander
Sent: Wednesday, March 25, 2015 2:31 PM
To: Sam Halliday
Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R.
Sparks; jfcanny
Subject: RE: Using CUDA within Spark / boosting linear algebra

Hi again,

My results:

https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing

Just in case, these tests are not for generalization of performance of
different libraries. I just want to pick a library that does at best dense
matrices multiplication for my task.

P.S. My previous issue with nvblas was the following: it has Fortran
blas functions, at the same time netlib-java uses C cblas functions. So,
one needs cblas shared library to use nvblas through netlib-java. Fedora
does not have cblas (but Debian and Ubuntu have), so I needed to compile
it. I could not use cblas from Atlas or Openblas because they link to their
implementation and not to Fortran blas.

Best regards, Alexander

Hi,

+-+
| Processes:

Re: functools.partial as UserDefinedFunction

Re: Using CUDA within Spark / boosting linear algebra

Re: Storing large data for MLlib machine learning

Re: Using CUDA within Spark / boosting linear algebra

Re: Using CUDA within Spark / boosting linear algebra

5 matches

Site Navigation

Mail list logo

Footer information