Hi,
I am trying to merge a number of htDig databases into a single database, and using
'htmerge -v -c main.conf -m 2nd.conf' works fine (i.e doesn't seem to report any
errors). Each individual database is also made available for separate searching.
However, when searching each individual database for 'help', and then the combined
database for 'help', the number of results from the combined database is *not* the
same as the sum of all of the individual databases :-(
Just to test a step further, I created an 'empty' database (htdig with a start url
that is not indexable and has no links) and then merged (htmerge -c empty.conf -m
2nd.conf) another database with it.
Theoretically, the individual and combined databases should provide similar results
(unless I am missing something obvious), however, searching the individual database
for 'help' provides 963 results and searching the combined database provides 593
results. Searching the individual database for 'debt' provides 69 results and
searching
the combined database provides 33 results.
I am using htDig 3.1.5 on Linux and when I compare the configuration files for the
2 databases, they are the same except for the 'database_dir', 'start_url', and my
second config file has additional 'limit_urls_to' and 'url_part_aliases' entries.
I couldn't find anything in the mail archives to do with this. Any ideas?
Allan
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html