Hi, 
 
I am trying to merge a number of htDig databases into a single database, and using 
'htmerge -v -c main.conf -m 2nd.conf' works fine (i.e doesn't seem to report any 
errors). Each individual database is also made available for separate searching. 

 
However, when searching each individual database for 'help', and then the combined 
database for 'help', the number of results from the combined database is *not* the 
same as the sum of all of the individual  databases :-( 
 
Just to test a step further, I created an 'empty' database (htdig with a start url 
that is not indexable and has no links) and then merged (htmerge -c empty.conf -m 
2nd.conf) another database with it.   
 
Theoretically, the individual and combined databases should provide similar results 
(unless I am missing something obvious), however, searching the individual database 
for 'help' provides 963 results and searching the combined database provides 593 
results. Searching the individual database for 'debt' provides 69 results and 
searching 
the combined database provides 33 results. 
 
I am using htDig 3.1.5 on Linux and when I compare the configuration files for the 
2 databases, they are the same except for the 'database_dir', 'start_url', and my 
second config file has additional 'limit_urls_to' and 'url_part_aliases' entries. 

 
I couldn't find anything in the mail archives to do with this.  Any ideas? 
 
Allan 

 



_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to