Stuart Staniford wrote: > There's a number of things about the framing of this discussion that are > bugging me
Me too, but nevertheless I find this to be one of the best threads on this list in the last few months :) > again. So the main nuisances on the wire keep changing, and any dataset > is necessarily going to get stale very quickly. Very true, and so if any dataset is needed this has to be kept in mind. > doing things. For us, the main focus is "What are the bad guys doing > now?" and "What features do we need to detect what they are now doing". > Usually, if you have good features with high discrimination, most > algorithms can be tweaked to do ok. True, up to a point. On the other hand, many algorithms can and should be safely discarded (many of them are instead published ;-) on the ground of not being theoretically able to handle some types of features correctly. > So forget looking for a dataset. Look for a wire. [...] > I think the problem of producing regular timely datasets that can be > safely published is probably just about intractable You have a point here, on the other side this poses a huge challenge to the replicability of results, and therefore to scientific vetting of data before publication. So I am convinced that you need to validate your ideas against data coming from the wire, but on the other hand some means of comparison between different approaches must be established. Stefano Zanero
