If you do want to stick with SVM-
This is a question that I keep coming back myself to and unfortunately have
forgotten more (and lost more literature) than I’ve retained.
I believe that the most easily parallelizable sections of libSVM for small
datasets are (for C-SVC(R), RBF and polynomial kernels):
1. The Kernel projections
2. The Hyper-Paramater grid search for C, \gamma (I believe this is now
included in LibSVM- I havent looked at it in a while)
3. For multi-class SVC: the concurrent computation of each SVM for each
one-against-one class vote.
I’m unfamiliar with any easily parallizable method for QP itself.
Unfortunately for (2), (3) this involves broadcasting the entire dataset out to
each node of a cluster (or working in a shared memory environment) so may not
be practical depending on the size of your data set. I’ve only ever
implemented (2) for relatively small datasets using MPI and and a with pure
java socket implementation.
Other approaches (further from simple LibSVM), which are more applicable to
large datasets (I’m less familiar with these):
4. Divide and conquer the QP/SMO problem and solve (As I’ve said, I’m
unfamiliar with this and I don’t know of any standard)
5. Break the training set into subsets and solve.
For (5) there are several approaches, two that I know of are ensemble
approaches and those that accumulate Support Vectors from each partition and
heuristically keep/reject them until the model converges. As well I’ve read
some just read some research on implementing this in map a MapReduce style[2].
I came across this paper [1] last night which you may find interesting as well
which is an interesting comparison of some SVM parallelization strategies,
particularly it discusses (1) for a shared memory environment and for
offloading work to GPUs (using OpenMP and CUDA). It also cites several other
nice papers discussing SVM parallelization strategies especially for (5). Also
then goes on to discuss more purely linear algebra approach to optimizing SVMs
(sec. 5)
Also regarding (5) you may be interested in [2] (something I’ve only looked
over briefly).
[1] http://arxiv.org/pdf/1404.1066v1.pdf
[2] http://arxiv.org/pdf/1301.0082.pdf
> From: [email protected]
> Date: Tue, 21 Oct 2014 17:32:22 -0700
> Subject: Re: Any idea which approaches to non-liniear svm are easily
> parallelizable?
> To: [email protected]
>
> Last I heard, the best methods pre-project and do linear SVM.
>
> Beyond that, I would guess that deep learning techniques would subsume
> non-linear SVM pretty easily. The best parallel implementation I know for
> that is in H2O.
>
>
>
> On Tue, Oct 21, 2014 at 4:12 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
> > in particular, from libSVM --
> > http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf ?
> >
> > thanks.
> > -d
> >