Hi all,

we have re-build our web site and re-indexed the site to the new db and it
actually worked fine. Everything is working fine. My suggestions are:

either there was a corruption in all of our files 
or 
for some reason the contents of the database was not rewritten when we were
trying to re-index it.

But, we can say the problem was resolved and all I can offer are these two
solutions. 

Our new web site will go live some time next week, so hopefully indexing do
not deteriorate by then ;-)

Natalija


-----Original Message-----
From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
Sent: 21 August 2002 18:10
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] text_factor: 0 but it still being picked up


According to Natalija Stevens:
> here are two documents attached, one is the txt copy of config file an d
the
> other one is the document that on fuzzy search on word "pensions" have
> scored 4 stars.
> The web site address is www.suffolkcc.gov.uk, but this obviously is not
the
> "surface" file.
>  
> If you do look at it, htdig is used under quick search. But what you see
> there is the  version on "live" server and it does not use the same
settings
> as the version I am running on our test server( which where the txt file
> comes from).

Is that why, for example, a search for "pensions" doesn't also match
"pension", even though in your test config you have set...

search_algorithm:       exact:1 synonyms:0.5 endings:0.1

I did notice that in the PDF file you sent me, "pensions" appears 5 times,
and "pension" appears 20 times, so that would help the document rank higher.
However, it wouldn't account for it ranking so much higher than a document
with "pensions" in the title.  What about link descriptions to these PDF
files?
Have you tried setting description_factor: 0?  How about backlink_factor: 0?

> Factor settings are running on their defaults there.
> If you type in pensions as the search, you'll noticed that all of the pdf
> and docs, come first. If you then go to the end of the search you'll see
> things like
> "Statement of Investment ....|Pensions\ Finance comes among lasts, but
> should be among firsts as this is the actual title .

Well, apart from what I've suggested above, I'm pretty much out of ideas.
Maybe someone else on the list who's looked more closely at the new
scoring code can offer some suggestions?

Have you ruled out any incompatibilities between your htsearch and the
databases created by htdig?  I.e. are you sure the htsearch you run from
the web server is from the same CVS snapshot as the htdig binary you're
running?  I assume this is all on the same machine too, is it?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to