On Thursday 24 July 2003 04:37, Todd Walton wrote:
> On Wed, 23 Jul 2003, Gordan wrote:
> > While using SVMs is all well and cool, it is easy to get stuck with the
> > idea of using AIs because it _sounds_ like a really clever thing to do.
> > Often, however, much simpler heuristic methods can give results that are
> > not measurably worse and much easier to implement.
>
> I think that's the idea with the discussion.  That the SVM idea be
> implemented, so that we can see if it's better or not.

One of the biggest problems is with tuning an SVM. There are many different 
types, and even more implementations of kernels you can use for curve 
fitting, and each different kernel will have different tuning parameters. 
Finding a good combination of kernels and parameters is something that is 
typically best left to a genetic optimizer.

> If it's better we'll have won, if it's not better we'll get a bit of SVM
> code laying around that somebody may find useful for some other task, and,
> thus, we'll have won.

I am not so sure about that, but OK. It all depends on how much CPU time the 
AI ends up sucking up. If it takes 100% more CPU time to achieve 1% better 
routing (which is probably not all that measurable), then the chances are 
that it's a non-starter.

If you start using a genetic optimizer, then you have to train whole 
populations of SVMs through several generations to get them to converge. 
Doing this periodically in the running node could be an issue. If the overall 
routing situation within the node is going to change continuously (which is 
not all that unlikely in a network such as Freenet), then an itterative AI 
may be better, e.g. a backprop neural network. But that would lead to other 
problems.

Additionally, AIs are not very good at dealing with noisy data. You have to go 
through extensive data cleaning processes to clean up the noise in the data 
before the AI will be able to do a decent job of predicting things. 
Unfortunately, a lot of noise reduction methods are only applicable to 
regression predictions, rather than classification predictions, because only 
regression results can be converted sensibly into the source data space.

Data transformations involved in cleaning up the data are a much bigger part 
of the work than just the AI.

One thing you may want to consider instead is using something like Foruier 
Transform Regression. It is easier to filter out the noise, and it is less 
voulnerable to it. To filter out the noise, look for the frequency 
coefficients that are within the lowest few percent in the distribution 
curve, and adjust them to 0. Then to get the next value, work out the 
coefficients for t+1 where t is the last time point where a sample was taken.

Unfortunately, this is still a very CPU intensive task for a lot of data 
points, and for few data points, the results are likely to be fairly 
meaningless, as the noise will not be clearly distinguishable from the 
signal.

All in all it is a much greater amount of work than people realize, and in 
this particular case, I cannot really see the advantage. Of course, I could 
be wrong.
_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Reply via email to