On Thursday 24 July 2003 04:37, Todd Walton wrote: > On Wed, 23 Jul 2003, Gordan wrote: > > While using SVMs is all well and cool, it is easy to get stuck with the > > idea of using AIs because it _sounds_ like a really clever thing to do. > > Often, however, much simpler heuristic methods can give results that are > > not measurably worse and much easier to implement. > > I think that's the idea with the discussion. That the SVM idea be > implemented, so that we can see if it's better or not.
One of the biggest problems is with tuning an SVM. There are many different types, and even more implementations of kernels you can use for curve fitting, and each different kernel will have different tuning parameters. Finding a good combination of kernels and parameters is something that is typically best left to a genetic optimizer. > If it's better we'll have won, if it's not better we'll get a bit of SVM > code laying around that somebody may find useful for some other task, and, > thus, we'll have won. I am not so sure about that, but OK. It all depends on how much CPU time the AI ends up sucking up. If it takes 100% more CPU time to achieve 1% better routing (which is probably not all that measurable), then the chances are that it's a non-starter. If you start using a genetic optimizer, then you have to train whole populations of SVMs through several generations to get them to converge. Doing this periodically in the running node could be an issue. If the overall routing situation within the node is going to change continuously (which is not all that unlikely in a network such as Freenet), then an itterative AI may be better, e.g. a backprop neural network. But that would lead to other problems. Additionally, AIs are not very good at dealing with noisy data. You have to go through extensive data cleaning processes to clean up the noise in the data before the AI will be able to do a decent job of predicting things. Unfortunately, a lot of noise reduction methods are only applicable to regression predictions, rather than classification predictions, because only regression results can be converted sensibly into the source data space. Data transformations involved in cleaning up the data are a much bigger part of the work than just the AI. One thing you may want to consider instead is using something like Foruier Transform Regression. It is easier to filter out the noise, and it is less voulnerable to it. To filter out the noise, look for the frequency coefficients that are within the lowest few percent in the distribution curve, and adjust them to 0. Then to get the next value, work out the coefficients for t+1 where t is the last time point where a sample was taken. Unfortunately, this is still a very CPU intensive task for a lot of data points, and for few data points, the results are likely to be fairly meaningless, as the noise will not be clearly distinguishable from the signal. All in all it is a much greater amount of work than people realize, and in this particular case, I cannot really see the advantage. Of course, I could be wrong. _______________________________________________ devl mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl
