On Jul 7, 2005, at 3:16 PM, Mark Bennett wrote:
Scanning their paper very quickly, I didn't see a specific mention (though I
might have missed it) of extremely short documents (< 5 words).

The study does not concern itself with different document lengths. They chose 6 different collections, but it appears that they were looking for a diversity of authorship and subject matter.

Was there
something specific about 1 and 2 word documents you had in mind?

Could you use a negative document boost on 1 and 2 word docs to solve your particular problem?

After pondering the clip method a little more, I've become wary of its effect on title fields. It would work very well on what you refer to as "main" and I generally call "bodytext", but if it were set as a default, it would become necessary to weight "title" fields or short "keywords" fields more heavily.

I think it would be possible, even desirable, to turn on clipping for bodytext while turning it off for title/keywords. That would require the implementor to be familiar with scoring formula theory, though.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to