Re: functools.partial as UserDefinedFunction

2015-03-26 Thread Karlson
Hi, I've filed a JIRA (https://issues.apache.org/jira/browse/SPARK-6553) and suggested a fix (https://github.com/apache/spark/pull/5206). On 2015-03-25 19:49, Davies Liu wrote: It’s good to support functools.partial, could you file a JIRA for it? On Wednesday, March 25, 2015 at 5:42 AM,

Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread John Canny
I mentioned this earlier in the thread, but I'll put it out again. Dense BLAS are not very important for most machine learning workloads: at least for non-image workloads in industry (and for image processing you would probably want a deep learning/SGD solution with convolution kernels). e.g.

Re: Storing large data for MLlib machine learning

2015-03-26 Thread Evan R. Sparks
On binary file formats - I looked at HDF5+Spark a couple of years ago and found it barely JVM-friendly and very Hadoop-unfriendly (e.g. the APIs needed filenames as input, you couldn't pass it anything like an InputStream). I don't know if it has gotten any better. Parquet plays much more nicely

Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread Sam Halliday
I'm not at all surprised ;-) I fully expect the GPU performance to get better automatically as the hardware improves. Netlib natives still need to be shipped separately. I'd also oppose any move to make Open BLAS the default - is not always better and I think natives really need DevOps buy-in.

Re: Using CUDA within Spark / boosting linear algebra

2015-03-26 Thread Sam Halliday
Btw, OpenBLAS requires GPL runtime binaries which are typically considered system libraries (and these fall under something similar to the Java classpath exception rule)... so it's basically impossible to distribute OpenBLAS the way you're suggesting, sorry. Indeed, there is work ongoing in Spark