[
https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810000#comment-15810000
]
Seth Hendrickson commented on SPARK-10078:
------------------------------------------
As a part of [SPARK-17136|https://issues.apache.org/jira/browse/SPARK-17136] I
am working on a generic optimization interface for Spark, which would allow
users to easily plug in their own optimizers in place of built-in ones. Because
of this, I have also been looking into how we can create an interface that
allows optimization with both local and distributed vector types in a single
interface. I have a branch that I have been doing some prototyping on
[here|https://github.com/sethah/spark/tree/spark-vlbfgs]. Actually, I was able
to get Yanbo's VLogisticRegression class working (on a very small dataset)
using the VLBFGS optimizer in my branch, which also works with local vector
types. Maybe you can let me know if this lines up at all with what you were
thinking?
Thinking about this interface without adding VL-BFGS, we can avoid any code
duplication with Breeze to start because we can simply plug in the Breeze code
to our abstraction (in my branch, that is what is done for LBFGS and OWLQN).
Adding VL-BFGS is a bit trickier.
The problems I see are that we need an abstraction that will allow us to
persist and unpersist the parameter vectors during optimization as needed.
Adding "persist" and "unpersist" methods to a vector space, for example, seems
a leaky abstraction. It might make sense to add this to Breeze itself if we can
avoid leaking RDD details into the interface. However, one benefit of
SPARK-17136 is that we could potentially eliminate our dependence on Breeze in
the future. I think it might make sense to implement our own VL-BFGS interface,
even if there is some duplication. Actually, I think this is part of an
important discussion that will happen as part of the optimization interface
design. I hope to post a detailed design document for that JIRA sometime in the
next few days.
Finally, can you provide more detail on your proposed changes to DiffFunction?
DiffFunction in Breeze is already abstract in it's parameter type...
> Vector-free L-BFGS
> ------------------
>
> Key: SPARK-10078
> URL: https://issues.apache.org/jira/browse/SPARK-10078
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Reporter: Xiangrui Meng
> Assignee: Yanbo Liang
>
> This is to implement a scalable version of vector-free L-BFGS
> (http://papers.nips.cc/paper/5333-large-scale-l-bfgs-using-mapreduce.pdf).
> Design document:
> https://docs.google.com/document/d/1VGKxhg-D-6-vZGUAZ93l3ze2f3LBvTjfHRFVpX68kaw/edit?usp=sharing
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]