[email protected] wrote: > 1. CAIDA backscatter dataset (contains reflected suspicious traffic)
This is just reflected backscatter from DoS attacks, how you would use it to evaluate an IDS is beyond me. > 2. LBNL/ICSI enterprise router dataset (contains segregated scan and > benign traffic) Just packet traces, anonymized, no content 3. DEFCON 8-10 CTF datasets (contain only attack > traffic during DEFCON competition) Only attacks, in a non-realistic network > 4. UMASS gateway link dataset (is > manually labeled by Yu Gu at University of Massachusetts) See 2 > 5. Endpoint > worm dataset (both benign and worm traffic, logged by argus -- > probably the only data available at endpoints) I can't understand how this was generated or collected. Pointers ? > tricky task. There are two standard ways to create labeled IDS > datasets: (1) separately collecting benign and malicious traffic and > then injecting to create infected traffic profiles, Which generates the artifacts present in IDEVAL >(2) collecting > data and then labeling it via manual inspection or a combination of > heuristics. Which is a tedious task no one wants to do (manually). What do you mean by heuristics ? > have been working on some semi-automated procedures to label > anomalies in network traffic. Which is the same as developing an anomaly detector. Thus, you are effectively using an anomaly detector to evaluate another, opening up a can of worms you really don't want to open. SZ
