If you do want to stick with SVM-

This is a question that I keep coming back myself to and unfortunately have 
forgotten more (and lost more literature) than I’ve retained.
I believe that the most easily parallelizable sections of libSVM for small 
datasets are (for C-SVC(R), RBF and polynomial kernels):

    1.    The Kernel projections 
    2.    The Hyper-Paramater grid search for C,  \gamma (I believe this is now 
included in LibSVM- I havent looked at it in a while)
    3.    For multi-class SVC:  the concurrent computation of each SVM for each 
one-against-one class vote.

I’m unfamiliar with any easily parallizable method for QP itself.  
Unfortunately for (2), (3) this involves broadcasting the entire dataset out to 
each node of a cluster (or working in a shared memory environment) so may not 
be practical depending on the size of your data set.  I’ve only ever 
implemented (2) for relatively small datasets using MPI and and a with pure 
java socket implementation.  

Other approaches (further from simple LibSVM), which are more applicable to 
large datasets (I’m less familiar with these):

    4.    Divide and conquer the QP/SMO problem and solve (As I’ve said, I’m 
unfamiliar with this and  I don’t know of any standard) 

    5.    Break the training set into subsets and solve.

For (5) there are several approaches, two that I know of are ensemble 
approaches and those that accumulate Support Vectors from each partition and 
heuristically keep/reject them until the model converges.  As well I’ve read 
some just read some research on implementing this in map a MapReduce style[2].

I came across this paper [1] last night which you may find interesting as well 
which is an interesting comparison of some SVM parallelization strategies, 
particularly it discusses (1) for a shared memory environment and for 
offloading work to GPUs (using OpenMP  and CUDA). It also cites several other 
nice papers discussing SVM parallelization strategies especially for (5).  Also 
then goes on to discuss more purely linear algebra approach to optimizing SVMs 
(sec. 5)

Also regarding (5) you may be interested in [2] (something I’ve only looked 
over briefly).

[1] http://arxiv.org/pdf/1404.1066v1.pdf
[2] http://arxiv.org/pdf/1301.0082.pdf


> From: [email protected]
> Date: Tue, 21 Oct 2014 17:32:22 -0700
> Subject: Re: Any idea which approaches to non-liniear svm are easily 
> parallelizable?
> To: [email protected]
> 
> Last I heard, the best methods pre-project and do linear SVM.
> 
> Beyond that, I would guess that deep learning techniques would subsume
> non-linear SVM pretty easily.  The best parallel implementation I know for
> that is in H2O.
> 
> 
> 
> On Tue, Oct 21, 2014 at 4:12 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> > in particular, from libSVM --
> > http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf ?
> >
> > thanks.
> > -d
> >
                                          

Reply via email to