On Thu, Jul 24, 2003 at 12:32:38PM -0700, Ian Clarke wrote:
> On Thu, Jul 24, 2003 at 11:43:02AM -0700, Rudi Cilibrasi wrote:
> > http://homepages.cwi.nl/~cilibrar/ngrouting/
> outlier.  Making the algorithm interpolate (as the RoutingTimeEstimator
> does) would probably have resulted in much more favourable results for
> BDA1.  

Though it does improve things, it still doesn't compete with SVM.  I've
updated my webpage to include a third candidate according to your
specification, called BDA3.  As you can see, it's still got a big error
as the graph shows, even for bins it knows about.  The choice of what to
do when you are outside interpolation range makes a difference between
and standard deviation of 1500  vs 1700 depending if you "return 500"
or "return nearest Average".  I tried both, and kept the better one
that says to "return 500" which does about as well as the "return 500".
You can see your new algorithm doing the right thing, according to your
specification, in the first few data points looking at the brown squares
as compared to the green squares and purple squares.

> I also think this explains BDA2's strangely superior performance - which
> is basically due to the blind luck that its default value of 500 happens
> to be much closer to the 2nd and 3rd data points.

This does make sense to me.

> Additionally, it isn't very clear whether any of these algorithms are
> showing much evidence of useful generalization of the data, most of the
> data points seem to be spread between 500 and 2500, if anything SVM
> seems to be following the actual data *too* closely.  I suspect that a
> version of BDA1 with interpolation might actually do a better job of 
> ignoring random fluctuations in the sample data.

In this case, at least, it certainly doesn't.  If I am not doing something
right in the interpolation, I invite you to try a smarter algorithm.
My feeling is no simple algorithm any of us makes up will do as well
as the most obvious application of SVM.

> 
> >  This is why
> > it doesn't make sense to only use SVM's when a certain crucial data
> > threshhold is reached -- the counter-intuitive truth is that seemingly
> > simpler and "more reliable" methods like exponential decay can wind
> > up getting confused early and staying confused more, because they cannot
> > differentiate model from noise.

> I am afraid that I don't think that this data supports that hypothesis -
> rather I think the problem here is caused by not interpolating between
> the bucket-averages.
...
> BDA implementation would employ interpolation - so I am not sure that we 
> can accept, yet, that SVMs are superior to a simpler approach.

I am curious if you still believe this, in view of the new experiments I've
just run.

Rudi

> 
> Ian.
> 
> -- 
> Ian Clarke                                                [EMAIL PROTECTED]
> Coordinator, The Freenet Project            http://freenetproject.org/
> Founder, Locutus                                      http://locut.us/
> Personal Homepage                                 http://locut.us/ian/


_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Reply via email to