[ 
https://issues.apache.org/jira/browse/SPARK-15575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450836#comment-15450836
 ] 

Vladimir Feinberg commented on SPARK-15575:
-------------------------------------------

Some of the biggest issues with Breeze perf I've experienced is that a lot of 
operations you'd expect it to be fast for are not; and it's pretty syntax and 
heavy use of implicits makes it easy to accidentally use this.

For instance:
1. Mixed dense/sparse operations frequently resort to a generic implementation 
in breeze that uses its Scala iterators.
2. Creation of vectors, under certain operations, will result in unnecessary 
boxing of doubles (and integers, for sparse vectors).
3. Slice vectors have no support for efficient operations. They are implemented 
in breeze in a way that makes them no better than Array[Double], which again 
makes us use Scala iterators whenever we want a fast, vectorized dot product, 
for instance.

Usability is tough sometimes. Even though a Vector[Double] interface seems 
flexible, a lot of implementations require an explicit knowledge of the vector 
type (Sparse/dense), else breeze silently uses the slow Scala interface. Heavy 
use of implicits is also a problem here, since they're not implemented for all 
permutations of vector types.

It's also easy to do, e.g. val `vec1 += vec2 * a * b`. This will create two 
intermediate vectors.

I think the biggest issue is that `ml.linalg.Vector` is Breeze-backed. We 
should use our own linear algebra (we do have `BLAS`, though to support slicing 
this interface would have to be expanded) and move around `ArrayView[Double]` 
inside the vector instead.

Breeze as a dependency, as mentioned below, is very useful still for 
optimization. I think we can keep it around for that, as long as it's only for 
that.

> Remove breeze from dependencies?
> --------------------------------
>
>                 Key: SPARK-15575
>                 URL: https://issues.apache.org/jira/browse/SPARK-15575
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> This JIRA is for discussing whether we should remove Breeze from the 
> dependencies of MLlib.  The main issues with Breeze are Scala 2.12 support 
> and performance issues.
> There are a few paths:
> # Keep dependency.  This could be OK, especially if the Scala version issues 
> are fixed within Breeze.
> # Remove dependency
> ## Implement our own linear algebra operators as needed
> ## Design a way to build Spark using custom linalg libraries of the user's 
> choice.  E.g., you could build MLlib using Breeze, or any other library 
> supporting the required operations.  This might require significant work.  
> See [SPARK-6442] for related discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to