I was simply referring parallelizing the iterative computation of the
Kernel matrix in the LibSVM code. I'd not proposed any decomposition.
Ok, I see- you're talking the approach taken in PSVM [1], correct? I'm not
familiar with this.
I do agree with you and Ted that traditional non-liniear SVM are not really
suitable for large datasets- I've had small datasets (~5000 x ~300) take days
on a single machine. The fact that accuracy (and training time) vary so
greatly w.r.t. parameter pairs make a grid search essential so expand that time
by the granularity of your search and... However I don't necessary agree that
they tend to overfit- but I guess that this depends on the nature of the
problem. In my cases I've often found them highly accurate in out of sample
classification.
As you and Ted have mentioned, Kernel approximation and projection to a linear
problem trained eg. via SGD seem to be good and scalable alternative for large
sets.. though I have no experience with this and have no idea how the accuracy
compares.
[1]http://idning.googlecode.com/svn-history/r711/trunk/paper/reference/google/NIPS2007_0435.pdf
> Date: Thu, 23 Oct 2014 01:58:32 -0400
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
> Subject: Re: Any idea which approaches to non-liniear svm are easily
> parallelizable?
>
> Yes I am, In fact, my question is just about whether approximation is
> used to make the total workload of computing the matrix sub-quadratic to
> the training set size.
>
> On 10/22/2014 02:21 PM, Andrew Palumbo wrote:
> > Peng, I'm not sure if you were referring to what I wrote:
> >
> >>>> 1. The Kernel projections
> > if so- I was talking about parallelizing the computation of eg. the RBF
> > Kernals
> >
> >> Date: Wed, 22 Oct 2014 13:42:45 -0400
> >> From: [email protected]
> >> To: [email protected]
> >> CC: [email protected]
> >> Subject: Re: Any idea which approaches to non-liniear svm are easily
> >> parallelizable?
> >>
> >> Is the kernel projection referring to online/incremental incomplete
> >> Cholesky decomposition? Sorry I haven't used SVM for a long time and
> >> didn't keep up with SotA.
> >>
> >> If that's true, I haven't find an out-of-the-box implementation, but
> >> this should be easy.
> >>
> >> Yours Peng
> >>
> >> On 10/22/2014 01:32 PM, Dmitriy Lyubimov wrote:
> >>> Andrew, thanks a bunch for the pointers!
> >>>
> >>>
> >>>
> >>> On Wed, Oct 22, 2014 at 10:14 AM, Andrew Palumbo <[email protected]>
> >>> wrote:
> >>>
> >>>> If you do want to stick with SVM-
> >>>>
> >>>> This is a question that I keep coming back myself to and unfortunately
> >>>> have forgotten more (and lost more literature) than I’ve retained.
> >>>> I believe that the most easily parallelizable sections of libSVM for
> >>>> small
> >>>> datasets are (for C-SVC(R), RBF and polynomial kernels):
> >>>>
> >>>> 2. The Hyper-Paramater grid search for C, \gamma (I believe
> >>>> this
> >>>> is now included in LibSVM- I havent looked at it in a while)
> >>>> 3. For multi-class SVC: the concurrent computation of each SVM
> >>>> for
> >>>> each one-against-one class vote.
> >>>>
> >>>> I’m unfamiliar with any easily parallizable method for QP itself.
> >>>> Unfortunately for (2), (3) this involves broadcasting the entire dataset
> >>>> out to each node of a cluster (or working in a shared memory environment)
> >>>> so may not be practical depending on the size of your data set. I’ve
> >>>> only
> >>>> ever implemented (2) for relatively small datasets using MPI and and a
> >>>> with
> >>>> pure java socket implementation.
> >>>>
> >>>> Other approaches (further from simple LibSVM), which are more applicable
> >>>> to large datasets (I’m less familiar with these):
> >>>>
> >>>> 4. Divide and conquer the QP/SMO problem and solve (As I’ve
> >>>> said,
> >>>> I’m unfamiliar with this and I don’t know of any standard)
> >>>>
> >>>> 5. Break the training set into subsets and solve.
> >>>>
> >>>> For (5) there are several approaches, two that I know of are ensemble
> >>>> approaches and those that accumulate Support Vectors from each partition
> >>>> and heuristically keep/reject them until the model converges. As well
> >>>> I’ve
> >>>> read some just read some research on implementing this in map a MapReduce
> >>>> style[2].
> >>>>
> >>>> I came across this paper [1] last night which you may find interesting as
> >>>> well which is an interesting comparison of some SVM parallelization
> >>>> strategies, particularly it discusses (1) for a shared memory environment
> >>>> and for offloading work to GPUs (using OpenMP and CUDA). It also cites
> >>>> several other nice papers discussing SVM parallelization strategies
> >>>> especially for (5). Also then goes on to discuss more purely linear
> >>>> algebra approach to optimizing SVMs (sec. 5)
> >>>>
> >>>> Also regarding (5) you may be interested in [2] (something I’ve only
> >>>> looked over briefly).
> >>>>
> >>>> [1] http://arxiv.org/pdf/1404.1066v1.pdf
> >>>> [2] http://arxiv.org/pdf/1301.0082.pdf
> >>>>
> >>>>
> >>>>> From: [email protected]
> >>>>> Date: Tue, 21 Oct 2014 17:32:22 -0700
> >>>>> Subject: Re: Any idea which approaches to non-liniear svm are easily
> >>>> parallelizable?
> >>>>> To: [email protected]
> >>>>>
> >>>>> Last I heard, the best methods pre-project and do linear SVM.
> >>>>>
> >>>>> Beyond that, I would guess that deep learning techniques would subsume
> >>>>> non-linear SVM pretty easily. The best parallel implementation I know
> >>>> for
> >>>>> that is in H2O.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 21, 2014 at 4:12 PM, Dmitriy Lyubimov <[email protected]>
> >>>> wrote:
> >>>>>> in particular, from libSVM --
> >>>>>> http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf ?
> >>>>>>
> >>>>>> thanks.
> >>>>>> -d
> >>>>>>
> >
>