Mailman (see http://www.gnu.org/software/mailman/index.html) is becoming a
widely used mailing list server.  There is some serious discussion beginning
about building a new list archiver for it.  This could include a mechanism
for making such archives friendlier with robots, I suspect.

I'd suggest that it include a mechanism for robots to retrieve updates via a
single transaction, rather than crawling individual pages.  A "give me
everything since [date]" or "give me everything since the last time I
visited" request, if you will.

A standard format for delivering archived discussions (with metadata) would
allow mailing lists to be searchable in conjunction with newsgroups, thus
creating integrated search services (imagine Google Groups with more than
just newsgroups).  If I'm really dreaming, the same format could be
supported by web forums, with similar benefits.

An important side issue when making harvesting efficient is what to do with
e-mail addresses, so that spammers can't easily use the same mechanism to
gather them.  I'd favor a mechanism that would show that messages (even
across lists) are from the same person, without disclosing their address.

Thoughts?  I'd especially like to hear from the folks who maintain the major
search sites' robots.  Would there be enough benefit in this to be worth
your time to support such an effort?

[ADMIN NOTE] I changed servers for the Robots list recently, which I believe
went smoothly.  Your mail filters may fail on this message -- I also
upgraded the list to Mailman 2.1, which changed the "From" header text, I
believe.  This is the first message to the list since the change.  Any
problems, please feel free to contact me off-list.

Nick

--
Nick Arnett
Phone/fax: (408) 904-7198
[EMAIL PROTECTED]


_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to