: > Length normalization of the field. Full-text matches on shorter : > fields score higher because the match is seen as more specific. You : > loose that if you omit norms. That's typically OK for short fields : > like "title" anyway, and fields that aren't full-text (like dates, : > numbers, etc). : : Yonik, I disagree on one point. I recommend against omitting norms : for title fields. : : Without norms, the titles "Duke Ellington" and "Duke Ellington meets : Count Basie" will contribute equally to their respective document : scores on a search for "Duke Ellington". For most applications, : exact title matches should win, so that's not optimal.
I tend to agree with you, having a length normalized tf for "title"ish fields is usefull, but it's really easy to construct examples either way (a query for "ipod mini" scoring "ipod mini headphones" higher then "apple ipod mini (black)" is the kind of thing i have to battle with every day) However: I just wanted to point out: a) it is possible to score exact matches on (tokenized) fields very high without using lengthNorm by indexing START and END tokens for the field as well, and then including them in your sloppy phrase queries -- the "tighter" match will score highest. b) generally speaking, the big win for omiting norms is when you're dealing with fields where a large percentage of your docs don't have any indexed terms for that field at all. If you're talking about a field that all of your documents have, the space taken up by the norms is probably tiny compared to the space taken up by the terms. it still may be worthwhile (especially if every doc has exactly one term for that field, or if there are only a handfully of unique terms) but don't expect it to do anythign magical. the savings from omiting norms on a field is really easy to calculate: 1 byte times the number of docs in your index. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]