Re: functools.partial as UserDefinedFunction
Hi, I've filed a JIRA (https://issues.apache.org/jira/browse/SPARK-6553) and suggested a fix (https://github.com/apache/spark/pull/5206). On 2015-03-25 19:49, Davies Liu wrote: It’s good to support functools.partial, could you file a JIRA for it? On Wednesday, March 25, 2015 at 5:42 AM, Karlson wrote: Hi all, passing a functools.partial-function as a UserDefinedFunction to DataFrame.select raises an AttributeException, because functools.partial does not have the attribute __name__. Is there any alternative to relying on __name__ in pyspark/sql/functions.py:126 ? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:dev-unsubscr...@spark.apache.org) For additional commands, e-mail: dev-h...@spark.apache.org (mailto:dev-h...@spark.apache.org) - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Using CUDA within Spark / boosting linear algebra
I mentioned this earlier in the thread, but I'll put it out again. Dense BLAS are not very important for most machine learning workloads: at least for non-image workloads in industry (and for image processing you would probably want a deep learning/SGD solution with convolution kernels). e.g. it was only relevant for 1/7 of our recent benchmarks, which should be a reasonable sample. What really matters is sparse BLAS performance. BIDMat is still an order of magnitude faster there. Those kernels are only in BIDMat, since NVIDIAs sparse BLAS dont perform well on power-law data. Its also the case that the overall performance of an algorithm is determined by the slowest kernel, not the fastest. If the goal is to get closer to BIDMach's performance on typical problems, you need to make sure that every kernel goes at comparable speed. So the real question is how much faster MLLib routines do on a complete problem with/without GPU acceleration. For BIDMach, its close to a factor of 10. But that required running entirely on the GPU, and making sure every kernel is close to its limit. -John If you think nvblas would be helpful, you should try it in some end-to-end benchmarks. On 3/25/15, 6:23 PM, Evan R. Sparks wrote: Yeah, much more reasonable - nice to know that we can get full GPU performance from breeze/netlib-java - meaning there's no compelling performance reason to switch out our current linear algebra library (at least as far as this benchmark is concerned). Instead, it looks like a user guide for configuring Spark/MLlib to use the right BLAS library will get us most of the way there. Or, would it make sense to finally ship openblas compiled for some common platforms (64-bit linux, windows, mac) directly with Spark - hopefully eliminating the jblas warnings once and for all for most users? (Licensing is BSD) Or am I missing something? On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander alexander.ula...@hp.com mailto:alexander.ula...@hp.com wrote: As everyone suggested, the results were too good to be true, so I double-checked them. It turns that nvblas did not do multiplication due to parameter NVBLAS_TILE_DIM from nvblas.conf and returned zero matrix. My previously posted results with nvblas are matrices copying only. The default NVBLAS_TILE_DIM==2048 is too big for my graphic card/matrix size. I handpicked other values that worked. As a result, netlib+nvblas is on par with BIDMat-cuda. As promised, I am going to post a how-to for nvblas configuration. https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing -Original Message- From: Ulanov, Alexander Sent: Wednesday, March 25, 2015 2:31 PM To: Sam Halliday Cc: dev@spark.apache.org mailto:dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks; jfcanny Subject: RE: Using CUDA within Spark / boosting linear algebra Hi again, I finally managed to use nvblas within Spark+netlib-java. It has exceptional performance for big matrices with Double, faster than BIDMat-cuda with Float. But for smaller matrices, if you will copy them to/from GPU, OpenBlas or MKL might be a better choice. This correlates with original nvblas presentation on GPU conf 2013 (slide 21): http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf My results: https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing Just in case, these tests are not for generalization of performance of different libraries. I just want to pick a library that does at best dense matrices multiplication for my task. P.S. My previous issue with nvblas was the following: it has Fortran blas functions, at the same time netlib-java uses C cblas functions. So, one needs cblas shared library to use nvblas through netlib-java. Fedora does not have cblas (but Debian and Ubuntu have), so I needed to compile it. I could not use cblas from Atlas or Openblas because they link to their implementation and not to Fortran blas. Best regards, Alexander -Original Message- From: Ulanov, Alexander Sent: Tuesday, March 24, 2015 6:57 PM To: Sam Halliday Cc: dev@spark.apache.org mailto:dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks Subject: RE: Using CUDA within Spark / boosting linear algebra Hi, I am trying to use nvblas with netlib-java from Spark. nvblas functions should replace current blas functions calls after executing LD_PRELOAD as suggested in http://docs.nvidia.com/cuda/nvblas/#Usage without any changes to netlib-java. It seems to work for simple Java example, but I cannot make it work with Spark. I run the following: export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64
Re: Storing large data for MLlib machine learning
On binary file formats - I looked at HDF5+Spark a couple of years ago and found it barely JVM-friendly and very Hadoop-unfriendly (e.g. the APIs needed filenames as input, you couldn't pass it anything like an InputStream). I don't know if it has gotten any better. Parquet plays much more nicely and there are lots of spark-related projects using it already. Keep in mind that it's column-oriented which might impact performance - but basically you're going to want your features in a byte array and deser should be pretty straightforward. On Thu, Mar 26, 2015 at 2:26 PM, Stephen Boesch java...@gmail.com wrote: There are some convenience methods you might consider including: MLUtils.loadLibSVMFile and MLUtils.loadLabeledPoint 2015-03-26 14:16 GMT-07:00 Ulanov, Alexander alexander.ula...@hp.com: Hi, Could you suggest what would be the reasonable file format to store feature vector data for machine learning in Spark MLlib? Are there any best practices for Spark? My data is dense feature vectors with labels. Some of the requirements are that the format should be easy loaded/serialized, randomly accessible, with a small footprint (binary). I am considering Parquet, hdf5, protocol buffer (protobuf), but I have little to no experience with them, so any suggestions would be really appreciated. Best regards, Alexander
Re: Using CUDA within Spark / boosting linear algebra
I'm not at all surprised ;-) I fully expect the GPU performance to get better automatically as the hardware improves. Netlib natives still need to be shipped separately. I'd also oppose any move to make Open BLAS the default - is not always better and I think natives really need DevOps buy-in. It's not the right solution for everybody. On 26 Mar 2015 01:23, Evan R. Sparks evan.spa...@gmail.com wrote: Yeah, much more reasonable - nice to know that we can get full GPU performance from breeze/netlib-java - meaning there's no compelling performance reason to switch out our current linear algebra library (at least as far as this benchmark is concerned). Instead, it looks like a user guide for configuring Spark/MLlib to use the right BLAS library will get us most of the way there. Or, would it make sense to finally ship openblas compiled for some common platforms (64-bit linux, windows, mac) directly with Spark - hopefully eliminating the jblas warnings once and for all for most users? (Licensing is BSD) Or am I missing something? On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: As everyone suggested, the results were too good to be true, so I double-checked them. It turns that nvblas did not do multiplication due to parameter NVBLAS_TILE_DIM from nvblas.conf and returned zero matrix. My previously posted results with nvblas are matrices copying only. The default NVBLAS_TILE_DIM==2048 is too big for my graphic card/matrix size. I handpicked other values that worked. As a result, netlib+nvblas is on par with BIDMat-cuda. As promised, I am going to post a how-to for nvblas configuration. https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing -Original Message- From: Ulanov, Alexander Sent: Wednesday, March 25, 2015 2:31 PM To: Sam Halliday Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks; jfcanny Subject: RE: Using CUDA within Spark / boosting linear algebra Hi again, I finally managed to use nvblas within Spark+netlib-java. It has exceptional performance for big matrices with Double, faster than BIDMat-cuda with Float. But for smaller matrices, if you will copy them to/from GPU, OpenBlas or MKL might be a better choice. This correlates with original nvblas presentation on GPU conf 2013 (slide 21): http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf My results: https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing Just in case, these tests are not for generalization of performance of different libraries. I just want to pick a library that does at best dense matrices multiplication for my task. P.S. My previous issue with nvblas was the following: it has Fortran blas functions, at the same time netlib-java uses C cblas functions. So, one needs cblas shared library to use nvblas through netlib-java. Fedora does not have cblas (but Debian and Ubuntu have), so I needed to compile it. I could not use cblas from Atlas or Openblas because they link to their implementation and not to Fortran blas. Best regards, Alexander -Original Message- From: Ulanov, Alexander Sent: Tuesday, March 24, 2015 6:57 PM To: Sam Halliday Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks Subject: RE: Using CUDA within Spark / boosting linear algebra Hi, I am trying to use nvblas with netlib-java from Spark. nvblas functions should replace current blas functions calls after executing LD_PRELOAD as suggested in http://docs.nvidia.com/cuda/nvblas/#Usage without any changes to netlib-java. It seems to work for simple Java example, but I cannot make it work with Spark. I run the following: export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64 env LD_PRELOAD=/usr/local/cuda-6.5/lib64/libnvblas.so ./spark-shell --driver-memory 4G In nvidia-smi I observe that Java is to use GPU: +-+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=| |0 8873C bash 39MiB | |0 8910C /usr/lib/jvm/java-1.7.0/bin/java 39MiB | +-+ In Spark shell I do matrix multiplication and see the following: 15/03/25 06:48:01 INFO JniLoader: successfully loaded /tmp/jniloader8192964377009965483netlib-native_system-linux-x86_64.so So I am sure that netlib-native is loaded and cblas supposedly used. However, matrix multiplication does executes on CPU since I see 16% of CPU used and 0% of GPU used. I also checked different matrix sizes, from 100x100 to 12000x12000 Could you
Re: Using CUDA within Spark / boosting linear algebra
Btw, OpenBLAS requires GPL runtime binaries which are typically considered system libraries (and these fall under something similar to the Java classpath exception rule)... so it's basically impossible to distribute OpenBLAS the way you're suggesting, sorry. Indeed, there is work ongoing in Spark right now to clear up something of this nature. On a more technical level, I'd recommend watching my talk at ScalaX which explains in detail why high performance only comes from machine optimised binaries, which requires DevOps buy-in (and, I'd recommend using MKL anyway on the CPU, not OpenBLAS). On an even deeper level, using natives has consequences to JIT and GC which isn't suitable for everybody and we'd really like people to go into that with their eyes wide open. On 26 Mar 2015 07:43, Sam Halliday sam.halli...@gmail.com wrote: I'm not at all surprised ;-) I fully expect the GPU performance to get better automatically as the hardware improves. Netlib natives still need to be shipped separately. I'd also oppose any move to make Open BLAS the default - is not always better and I think natives really need DevOps buy-in. It's not the right solution for everybody. On 26 Mar 2015 01:23, Evan R. Sparks evan.spa...@gmail.com wrote: Yeah, much more reasonable - nice to know that we can get full GPU performance from breeze/netlib-java - meaning there's no compelling performance reason to switch out our current linear algebra library (at least as far as this benchmark is concerned). Instead, it looks like a user guide for configuring Spark/MLlib to use the right BLAS library will get us most of the way there. Or, would it make sense to finally ship openblas compiled for some common platforms (64-bit linux, windows, mac) directly with Spark - hopefully eliminating the jblas warnings once and for all for most users? (Licensing is BSD) Or am I missing something? On Wed, Mar 25, 2015 at 6:03 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: As everyone suggested, the results were too good to be true, so I double-checked them. It turns that nvblas did not do multiplication due to parameter NVBLAS_TILE_DIM from nvblas.conf and returned zero matrix. My previously posted results with nvblas are matrices copying only. The default NVBLAS_TILE_DIM==2048 is too big for my graphic card/matrix size. I handpicked other values that worked. As a result, netlib+nvblas is on par with BIDMat-cuda. As promised, I am going to post a how-to for nvblas configuration. https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing -Original Message- From: Ulanov, Alexander Sent: Wednesday, March 25, 2015 2:31 PM To: Sam Halliday Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks; jfcanny Subject: RE: Using CUDA within Spark / boosting linear algebra Hi again, I finally managed to use nvblas within Spark+netlib-java. It has exceptional performance for big matrices with Double, faster than BIDMat-cuda with Float. But for smaller matrices, if you will copy them to/from GPU, OpenBlas or MKL might be a better choice. This correlates with original nvblas presentation on GPU conf 2013 (slide 21): http://on-demand.gputechconf.com/supercomputing/2013/presentation/SC3108-New-Features-CUDA%206%20-GPU-Acceleration.pdf My results: https://docs.google.com/spreadsheets/d/1lWdVSuSragOobb0A_oeouQgHUMx378T9J5r7kwKSPkY/edit?usp=sharing Just in case, these tests are not for generalization of performance of different libraries. I just want to pick a library that does at best dense matrices multiplication for my task. P.S. My previous issue with nvblas was the following: it has Fortran blas functions, at the same time netlib-java uses C cblas functions. So, one needs cblas shared library to use nvblas through netlib-java. Fedora does not have cblas (but Debian and Ubuntu have), so I needed to compile it. I could not use cblas from Atlas or Openblas because they link to their implementation and not to Fortran blas. Best regards, Alexander -Original Message- From: Ulanov, Alexander Sent: Tuesday, March 24, 2015 6:57 PM To: Sam Halliday Cc: dev@spark.apache.org; Xiangrui Meng; Joseph Bradley; Evan R. Sparks Subject: RE: Using CUDA within Spark / boosting linear algebra Hi, I am trying to use nvblas with netlib-java from Spark. nvblas functions should replace current blas functions calls after executing LD_PRELOAD as suggested in http://docs.nvidia.com/cuda/nvblas/#Usage without any changes to netlib-java. It seems to work for simple Java example, but I cannot make it work with Spark. I run the following: export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64 env LD_PRELOAD=/usr/local/cuda-6.5/lib64/libnvblas.so ./spark-shell --driver-memory 4G In nvidia-smi I observe that Java is to use GPU: +-+ | Processes: