On Thursday, November 7, 2002, at 10:12  PM, Gilles Detillieux wrote:

I think there are some cases where that's true, but not necessarily in all
cases, so I don't know how much you can optimize this. E.g., for certain
keyword tags we allow the form <meta foo="bar">, but the configurable
keyword names must be of the form <meta name="foo" contents="bar">.
I don't know that we'd want to fully generalize this, but I'm open to
suggestions/recommendations from others.
Keep in mind that the form <meta name="foo" contents="bar"> is the definitive W3C standard, whereas the other form is an older, depreciated case. I don't see much HTML like this anymore. Whether we want to completely ignore them or not is hard to say.

I'm sure there'd be a fair bit of discussion about this in the htdig-dev
archives of 2-3 years ago. I don't think it ever got formally documented
elsewhere (yet). The reason was to allow "scoring on the fly".
As well, it allows restricting word searches based on the "field" or tags that contain the words.

The decision to put all headings into one factor was to reduce the number
of bits the flag would take by 5, so the flags can fit in a single byte.
We're going to have to increase this anyway, to accomodate custom fields,
so it might make sense to reintroduce the distinction between heading
No, the flags never were supposed to be a single byte. There happen to be 8 bits currently defined, but more than this should be actually stored for custom fields (and ideally to keep the database format identical).

OTOH, there were 6 slots for headings under 3.1, and it seems like a huge waste of bits considering most won't be used--even with 3-bit encoding. Some other document formats also don't make much distinction between heading levels. Do people really think that markup beyond h1, h2 and h3 occurs? A lot of HTML I see these days uses <strong> or <b> or <i> tagging (or worse, <font>).

Keep in mind that every bit we add to the flags adds more space to every word. Right now, I've specified 8 bits, including author and URL text which aren't currently used.

-Geoff



-------------------------------------------------------
This sf.net email is sponsored by: See the NEW Palm Tungsten T handheld. Power & Color in a compact size!
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev


Reply via email to