On Tue, Jul 22, 2003 at 04:50:48PM -0700, Rudi Cilibrasi wrote: > My proposal, here, is to train a pair of SVM's per remote host with > only the data relevant to that host. The first SVM will predict > if the request will succeed or fail. If the first indicates > success, then the second will be called upon to predict how long the > request will take.
Hmmm, the reality is that I don't think a binary prediction will really be all that useful, since the sample data is likely to be quite widely distributed - is the algorithm not capable of providing the relative probabilities of success and failure? > When deciding where to route, a simple heuristic to start may be to > sort all predicted successful hosts according to expected latency, > and choose the best. Hmmmm, that is similar to what we were planning, however I don't think it makes sense to make a binary success/failure decision for each host - since, on average, about 70% of all requests could fail - if your algorithm was just making a binary decision based on the most likely algorithm, it would probably decide that all nodes will fail! > As for overall integration within the FreeNet source, here is one > way of designing it: At program startup, spawn off a single thread > called "TrainingThread". As FreeNet runs, it queues data in a pending > queue from the main thread. FreeNet is also calling the Predict function > rapidly to make routing decisions using the "last good SVM model". > The training thread is simply looping over > and over: First, it looks to see if there is at least one new TrainingInput > in the pending queue. If so, it reads in *all* of the new inputs. > Now, it begins a new training cycle with all the new data and all of > the old data it already had from previous cycles. Maybe ten seconds > later, it is done training and now replaces the old model (which was > meanwhile being used for prediction) with the new model that includes > more data. To prevent data from getting too great, we can set some > maximum number of TrainingInput samples allowed, say 1000 -- above this, > and training points start getting randomly removed, to slowly rotate out > old points nondeterministically. To prevent CPU overloading, we > can also add an optional Thread.sleep somewhere in there to throttle > the TrainingThread. Ok, we do need to think about the processor requirements of doing this - Freenet already has a reputation for being a CPU and memory hog, we don't want to further justify that. > If this sounds like a good path to follow, I think a good first step > would be to clearly define the TrainingInput and TrainingTarget classes. > Then, use Java object Serialization to write a number of these out to > a file for offline testing and tuning by me and whoever else is > interested in some statistical detective work. These should be > captured via a "typical use" FreeNet session. I can then make a better > assessment of how the learning algorithm is performing, and hopefully > we can make some comparisons using cross-validation to determine which > learning algorithms and parameters are working best. Ok. My goal is to get NGrouting implemented ASAP, and so I would suggest that since our current algorithm is almost completely implemented, that we go ahead with that. Once it is up and running, we will then be able to experiment with other learning algorithms and do empirical comparisons of their effectiveness. Ian. -- Ian Clarke [EMAIL PROTECTED] Coordinator, The Freenet Project http://freenetproject.org/ Founder, Locutus http://locut.us/ Personal Homepage http://locut.us/ian/
pgp00000.pgp
Description: PGP signature
