--On Friday, February 28, 2003 08:41:59 AM -0800 Nick Arnett <[EMAIL PROTECTED]> wrote:

A standard format for delivering archived discussions (with metadata) would allow mailing lists to be searchable in conjunction with newsgroups, thus creating integrated search services (imagine Google Groups with more than just newsgroups).

Let's start with making the archives follow existing standards, like the robots meta tag, and descriptions.

Indexes to messages should have NOINDEX, FOLLOW.

Messages should have a title which reflects the mailing list and
the title of the message. The beginning of the message (or first
section without ">  ") should be the description meta tag. The
author should be in a dc.creator meta tag. The date should be in
dc.date, with the date in ISO 8601 web profile, if possible,
RFC 822, otherwise. The message ID probably should be dc.identifier.
The mailing list address/name could be in dc.publisher.

The navcrap and message header fields should be marked off with
all of the noindex sectional tags we can find.

Ultraseek already has an NNTP spider. It works OK, and parsing
the mail/news format directly gives better results than trying
to reverse engineer it from the pretty-printed HTML. It also allows
sane handling of attachments. But it gives NNTP URLs, which surprise
some users.

Update: Verity Ultraseek is the product formerly know as Inktomi
Enterprise search and Inktomi Search/Enterprise. That product was
earlier known as Ultraseek Server. The name remains the same, eh?

wunder
--
Walter Underwood
Software Architect
Verity Ultraseek

_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to