Hi Thunder,

Please understand that both MLlib and breeze are in active
development. Before v1.0, we used jblas but in the public APIs we only
exposed Array[Double]. In v1.0, we introduced Vector that supports
both dense and sparse data and switched the backend to
breeze/netlib-java (except ALS). We only used few breeze methods in
our implementation and we benchmarked them one by one. It was hard to
foresee problems caused by including breeze at that time, for example,
https://issues.apache.org/jira/browse/SPARK-1520. Being conservative
in v1.0 was not a bad choice. We should benchmark breeze v0.8.1 for
v1.1 and consider make toBreeze a developer API. For now, if you are
migrating code from v0.9, you can use `Vector.toArray` to get the
value array. Sorry for the inconvenience!

Best,
Xiangrui

On Wed, Jul 2, 2014 at 2:42 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> in my humble opinion Spark should've supported linalg a-la [1] before it
> even started dumping methodologies into mllib.
>
> [1] http://mahout.apache.org/users/sparkbindings/home.html
>
>
> On Wed, Jul 2, 2014 at 2:16 PM, Thunder Stumpges
> <thunder.stump...@gmail.com> wrote:
>>
>> Thanks. I always hate having to do stuff like this. It seems like they
>> went a bit overboard with all the "private[mllib]" declarations... possibly
>> all in the name of "thou shalt not change your public API". If you don't
>> make your public API usable, we end up having to work around it anyway...
>>
>> Oh well.
>>
>> Thunder
>>
>>
>>
>> On Wed, Jul 2, 2014 at 2:05 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>> i did the second option: re-implemented .toBreeze as .breeze using pimp
>>> classes
>>>
>>>
>>> On Wed, Jul 2, 2014 at 5:00 PM, Thunder Stumpges
>>> <thunder.stump...@gmail.com> wrote:
>>>>
>>>> I am upgrading from Spark 0.9.0 to 1.0 and I had a pretty good amount of
>>>> code working with internals of MLLib. One of the big changes was the move
>>>> from the old jblas.Matrix to the Vector/Matrix classes included in MLLib.
>>>>
>>>> However I don't see how we're supposed to use them for ANYTHING other
>>>> than a container for passing data to the included APIs... how do we do any
>>>> math on them? Looking at the internal code, there are quite a number of
>>>> private[mllib] declarations including access to the Breeze representations
>>>> of the classes.
>>>>
>>>> Was there a good reason this was not exposed? I could see maybe not
>>>> wanting to expose the 'toBreeze' function which would tie it to the breeze
>>>> implementation, however it would be nice to have the various mathematics
>>>> wrapped at least.
>>>>
>>>> Right now I see no way to code any vector/matrix math without moving my
>>>> code namespaces into org.apache.spark.mllib or duplicating the code in
>>>> 'toBreeze' in my own util functions. Not very appealing.
>>>>
>>>> What are others doing?
>>>> thanks,
>>>> Thunder
>>>>
>>>
>>
>

Reply via email to