Peng, I'm not sure if you were referring to what I wrote: > >> 1. The Kernel projections
if so- I was talking about parallelizing the computation of eg. the RBF Kernals > Date: Wed, 22 Oct 2014 13:42:45 -0400 > From: [email protected] > To: [email protected] > CC: [email protected] > Subject: Re: Any idea which approaches to non-liniear svm are easily > parallelizable? > > Is the kernel projection referring to online/incremental incomplete > Cholesky decomposition? Sorry I haven't used SVM for a long time and > didn't keep up with SotA. > > If that's true, I haven't find an out-of-the-box implementation, but > this should be easy. > > Yours Peng > > On 10/22/2014 01:32 PM, Dmitriy Lyubimov wrote: > > Andrew, thanks a bunch for the pointers! > > > > > > > > On Wed, Oct 22, 2014 at 10:14 AM, Andrew Palumbo <[email protected]> wrote: > > > >> If you do want to stick with SVM- > >> > >> This is a question that I keep coming back myself to and unfortunately > >> have forgotten more (and lost more literature) than I’ve retained. > >> I believe that the most easily parallelizable sections of libSVM for small > >> datasets are (for C-SVC(R), RBF and polynomial kernels): > >> > >> 2. The Hyper-Paramater grid search for C, \gamma (I believe this > >> is now included in LibSVM- I havent looked at it in a while) > >> 3. For multi-class SVC: the concurrent computation of each SVM for > >> each one-against-one class vote. > >> > >> I’m unfamiliar with any easily parallizable method for QP itself. > >> Unfortunately for (2), (3) this involves broadcasting the entire dataset > >> out to each node of a cluster (or working in a shared memory environment) > >> so may not be practical depending on the size of your data set. I’ve only > >> ever implemented (2) for relatively small datasets using MPI and and a with > >> pure java socket implementation. > >> > >> Other approaches (further from simple LibSVM), which are more applicable > >> to large datasets (I’m less familiar with these): > >> > >> 4. Divide and conquer the QP/SMO problem and solve (As I’ve said, > >> I’m unfamiliar with this and I don’t know of any standard) > >> > >> 5. Break the training set into subsets and solve. > >> > >> For (5) there are several approaches, two that I know of are ensemble > >> approaches and those that accumulate Support Vectors from each partition > >> and heuristically keep/reject them until the model converges. As well I’ve > >> read some just read some research on implementing this in map a MapReduce > >> style[2]. > >> > >> I came across this paper [1] last night which you may find interesting as > >> well which is an interesting comparison of some SVM parallelization > >> strategies, particularly it discusses (1) for a shared memory environment > >> and for offloading work to GPUs (using OpenMP and CUDA). It also cites > >> several other nice papers discussing SVM parallelization strategies > >> especially for (5). Also then goes on to discuss more purely linear > >> algebra approach to optimizing SVMs (sec. 5) > >> > >> Also regarding (5) you may be interested in [2] (something I’ve only > >> looked over briefly). > >> > >> [1] http://arxiv.org/pdf/1404.1066v1.pdf > >> [2] http://arxiv.org/pdf/1301.0082.pdf > >> > >> > >>> From: [email protected] > >>> Date: Tue, 21 Oct 2014 17:32:22 -0700 > >>> Subject: Re: Any idea which approaches to non-liniear svm are easily > >> parallelizable? > >>> To: [email protected] > >>> > >>> Last I heard, the best methods pre-project and do linear SVM. > >>> > >>> Beyond that, I would guess that deep learning techniques would subsume > >>> non-linear SVM pretty easily. The best parallel implementation I know > >> for > >>> that is in H2O. > >>> > >>> > >>> > >>> On Tue, Oct 21, 2014 at 4:12 PM, Dmitriy Lyubimov <[email protected]> > >> wrote: > >>>> in particular, from libSVM -- > >>>> http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf ? > >>>> > >>>> thanks. > >>>> -d > >>>> >
