When experimenting with how url_seed_score works, I find that I'm not completely happy with it. It's a bit hard to use in practice, since score-figures are so divergent - mostly for a documented reason, mind you. (Note that url_seed_score - renamed from url_adjust_score - is not submitted or checked-in yet.) Most people would probably just want the hits to appear in an order depending on the web-area. They probably do not want to bother with finding the right magic constants with which to seed the scores. To down-seed an area completely, means you have to find the highest possible score and/or adjust the *_factor parameters. A score easily goes up in the millions with a couple of back-links and the searched-for word appearing in the title and the first words. Besides, seeding the score munges up the actual score, making the easily understood "stars" and "percentage" be inaccurate. Now, I think url_seed_score is still useful, like when you actually want to mix results from different areas together, just slightly seed some areas. As a side-note to prospective users, small-figure factors and constants should be used, if score figures are important. It seems they need to be kept at most in the thousands, or scores go completely off. For the just-order-in-these-areas use, I would like to propose another more easily-used feature, controlled by an attribute called (say) "results_order". It would simply take a list of regex:es and always order the results according to that list, having the "normal" sort-order as the second-order sort criteria. Users of this attribute might want to include it advanced search-forms (see allow_in_form) as an option to be turned off, for searchers who don't want someone else to dictate the order in which search hits be served. :-) Use of this attribute would look like: results_order: faq.html * /mailinglist/ /testresults/ Since you probably want to "move up" some areas in the results list and "move down" others, you want to say where you want the rest. This is expressed most intuitively (IMHO) as a lone "*". That character most often has no meaning used as a normal part of an URL; it is not the catch-all regex ".*" and is seldom found as part of an URL (is it even valid?). If not specified in the list, it defaults to be at the end of the list. And no, I can't think of a sane way to use ".*" in that list, but it is a valid regex and as such should not be special-cased IMHO. For the example above, you always want hits in faq.html to appear first. It is probably a large document, so even if the hit-score is low, it may be because the search-item is found at the end of the document, but still is probably the document the searcher is looking for. The area matched by /mailinglist/ is moved down, but still before the lowly /testresults/ area. As said, all other areas come at the point of the "*". I'll implement this for 3.1.4 (as a patch) and for main trunk after moving the sorting to Searcher.cc. I'll refrain from doing this until the 3.2 changes are merged back on the main trunk. Before someone else says it: No, I *don't* want this to go in 3.2.0 (unless everybody else thinks so). BTW, Geoff; you said you were about to merge back 3.2 changes, Would you rather do it yourself, or would you want help with that? The implementation of results_order seems simple: Wherever the searching takes place, the results will be divided (or are already divided) into lists separate for each area in results_order. Then the normal "sort" is applied for each list, then the lists are concatenated to one, which is passed on for display-decorating and output. Comments welcome, as always. brgds, H-P ------------------------------------ To unsubscribe from the htdig3-dev mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.
