Mailman (see http://www.gnu.org/software/mailman/index.html) is becoming a widely used mailing list server. There is some serious discussion beginning about building a new list archiver for it. This could include a mechanism for making such archives friendlier with robots, I suspect.
I'd suggest that it include a mechanism for robots to retrieve updates via a single transaction, rather than crawling individual pages. A "give me everything since [date]" or "give me everything since the last time I visited" request, if you will. A standard format for delivering archived discussions (with metadata) would allow mailing lists to be searchable in conjunction with newsgroups, thus creating integrated search services (imagine Google Groups with more than just newsgroups). If I'm really dreaming, the same format could be supported by web forums, with similar benefits. An important side issue when making harvesting efficient is what to do with e-mail addresses, so that spammers can't easily use the same mechanism to gather them. I'd favor a mechanism that would show that messages (even across lists) are from the same person, without disclosing their address. Thoughts? I'd especially like to hear from the folks who maintain the major search sites' robots. Would there be enough benefit in this to be worth your time to support such an effort? [ADMIN NOTE] I changed servers for the Robots list recently, which I believe went smoothly. Your mail filters may fail on this message -- I also upgraded the list to Mailman 2.1, which changed the "From" header text, I believe. This is the first message to the list since the change. Any problems, please feel free to contact me off-list. Nick -- Nick Arnett Phone/fax: (408) 904-7198 [EMAIL PROTECTED] _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots