Hi everybody,
I am quite new to htdig. I'm playing with it to build an insurance related
search engine which is growing quite well and i hope to open it at the
beginning of march.
Htdig is great, solves problems and leaves loads of time to wonder for new
features :)
I was thinking about a couple of missing (flame me please, but no to hard):
-link: url: etc..search operators
is this "Field-base searching" discussed in the TODO list ?
-url dependant template
I'd like to have different templates with certain urls, major
sponsors, free services categories from our local directory
etc.etc. right know yuo can modify only the stars image ...
-search output site grouping.
I'm getting loads of searches with the first 30 pages all coming
from the same site. Obviously this is dependant on my configuration
It would be nice to have a switch that groups all urls from the same
site showing only the first hit and perhaps a variable like
$(SISTER_URLS_LIST) that could be expanded to ... guess ...
a list of linked url from the same site matching the query. :-)
-strong anti spamming control
The sites that happen to have more often this behavior are
intensively using keywords, description and lots of tricks to get
high rankings. I'd like to give penalties for such things as:
keyword spamming, empty content etc.
something like:
max_keyword_frequency: 6
if i get more than 6 times the same word...
max_keyword_density: 10%
if I get more than 1 occurrency for each 10 words....
keyword_spam: -2
I could start giving a -2 penalty for extra words
max_keyword_length: 150
if keyword tag is more than 150 characters long ...
spammed_keyword_factor:
give a lower keyword factor.
different_keyword_description: true
if keyword and description are equal discard one.
obviously discard duplicate documents (but that's there already)
-raw excerpts
We are also using htdig to compile searchable dbs of glossary data.
If it was possible to have raw excerpts (we obviously have full
documents in excerpts right now) we could dump the files and have a
more compact and functional system.
There is no real need after a search to send the user to the HTML
page. But this now means loosing formatting and anchors.
enough for now.
Alberto Olindo
Assibit S.r.l.
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.