On Thu, Jul 24, 2003 at 08:57:59AM -0700, Ian Clarke wrote:
> What were the CPU and memory requirements for this experiment?  
It took about 5 seconds to run on my 2.0ghz machine.  The mem footprint
is negligible I believe.  This program is not a good representation of
cpu loads to expect in the real version, as I didn't do any batching
of training data.

> It is also rather surprising that the BDA algorithm without useNearest
> performed *better* than BDA with useNearest - do you have any hypothesis
> as to why this might be?  It might indicate a problem with the BDA
> implementation.  

Sure.  Look at the predictions (out.txt) it makes after it reads that second
data point with a time near ten seconds (around 10000).  Since that
one comes up early, it is used for several later guesses as the table
starts getting filled in.  Of course, that's an aberrant data point,
and illustrates the brittle nature of homemade algorithms like
decaying averages -- they don't have the intelligence to realize things
like this.  So usually they do about as well as an SVM, but sometimes
they "lose big".

> Also, a decay rate of 0.5 is quite high, it might be interesting to see
> what happens with a lower decay rate and more data.  In a typical
> implementation, how many bins were there, and what were the ranges of
> document sizes?  If there were too few bins, then it could be that the 
> BDA was suffering due to its coarseness with document size.

I tried several different parameter values and stuck with what seemed
to work the best for the BDA. I invite everybody else to tweak these
and see what comes up.

> It would be very interesting to see how well our RoutingTimeEstimator
> class performs with the same data (perhaps using the document size
> instead of the "key"), since, as it doesn't use a "binned" approach, its 
> performance is likely to be superior to BDA.

I'd love to see this also.

> If I recall correctly, Hui Zhang <[EMAIL PROTECTED]>, a PhD student at
> the University of Southern California did some testing using collected
> response time data from Freenet of our ResponseTimeEstimator class. More
> interesting still would be to measure performance using Hui's data as it
> will be much closer to the actual data our simulator is likely to
> collect, in particular - it will be *much* more noisy than I suspect the
> data was in this experiment - and it has been suggested that SVM might
> not be as good with very noisy data.

I would be happy to run tests if somebody can dig up this data.  I
don't believe the claim that SVM's perform poorly on noisy data --
to the contrary, SVM's are probably among the most robust learning
algorithms you can use in the face of noisy data.  As evidence of this,
consider how virtually every modern OCR system uses SVM's.  And
consider just how noisy handwritten digits are.

Rudi

_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Reply via email to