On Thu, Jul 24, 2003 at 08:57:59AM -0700, Ian Clarke wrote: > What were the CPU and memory requirements for this experiment? It took about 5 seconds to run on my 2.0ghz machine. The mem footprint is negligible I believe. This program is not a good representation of cpu loads to expect in the real version, as I didn't do any batching of training data.
> It is also rather surprising that the BDA algorithm without useNearest > performed *better* than BDA with useNearest - do you have any hypothesis > as to why this might be? It might indicate a problem with the BDA > implementation. Sure. Look at the predictions (out.txt) it makes after it reads that second data point with a time near ten seconds (around 10000). Since that one comes up early, it is used for several later guesses as the table starts getting filled in. Of course, that's an aberrant data point, and illustrates the brittle nature of homemade algorithms like decaying averages -- they don't have the intelligence to realize things like this. So usually they do about as well as an SVM, but sometimes they "lose big". > Also, a decay rate of 0.5 is quite high, it might be interesting to see > what happens with a lower decay rate and more data. In a typical > implementation, how many bins were there, and what were the ranges of > document sizes? If there were too few bins, then it could be that the > BDA was suffering due to its coarseness with document size. I tried several different parameter values and stuck with what seemed to work the best for the BDA. I invite everybody else to tweak these and see what comes up. > It would be very interesting to see how well our RoutingTimeEstimator > class performs with the same data (perhaps using the document size > instead of the "key"), since, as it doesn't use a "binned" approach, its > performance is likely to be superior to BDA. I'd love to see this also. > If I recall correctly, Hui Zhang <[EMAIL PROTECTED]>, a PhD student at > the University of Southern California did some testing using collected > response time data from Freenet of our ResponseTimeEstimator class. More > interesting still would be to measure performance using Hui's data as it > will be much closer to the actual data our simulator is likely to > collect, in particular - it will be *much* more noisy than I suspect the > data was in this experiment - and it has been suggested that SVM might > not be as good with very noisy data. I would be happy to run tests if somebody can dig up this data. I don't believe the claim that SVM's perform poorly on noisy data -- to the contrary, SVM's are probably among the most robust learning algorithms you can use in the face of noisy data. As evidence of this, consider how virtually every modern OCR system uses SVM's. And consider just how noisy handwritten digits are. Rudi _______________________________________________ devl mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl
