Hello! Long time ago (Fri, 12 Nov 1999) Eric Mings wrote this message: ...whether udmsearch can be modified to allow for multiple url tables to store information based upon a user criteria (such as storing different subsets of sites indexed) in a table of their choice. Check whole message here: http://www.mail-archive.com/udmsearch%40web.izhcom.ru/msg00072.html Just an idea how to easily implement this with MySQL using it's MERGE tables. I tested it with 3.2.x sources, it works just fine. However it should work with 3.1.x too. (This thoughts also will be added into documentation) I'm considering configuration for "single" mode and two subsets. Other modes and more subsets are to be configured in the same order. Instalaltion steps. 1. Create two databases "www" and "dbms" and create standard mnoGoSearch database structure for them. 2. Create database "collection" and create structure using MERGE tables (3.2.x structure is shown): CREATE TABLE dict ( url_id int(11) NOT NULL default '0', word varchar(32) NOT NULL default '', intag int(11) NOT NULL default '0', KEY url_id(url_id), KEY word_url(word) ) TYPE=MERGE UNION=(www.dict,dbms.dict); CREATE TABLE url ( rec_id int(11) NOT NULL auto_increment, status int(11) NOT NULL default '0', url char(128) binary NOT NULL default '', content_type char(48) NOT NULL default '', title char(128) NOT NULL default '', txt char(255) NOT NULL default '', docsize int(11) NOT NULL default '0', next_index_time int(11) NOT NULL default '0', last_mod_time int(11) NOT NULL default '0', referrer int(11) NOT NULL default '0', tag char(16) NOT NULL default '0', hops int(11) NOT NULL default '0', category char(16) NOT NULL default '', keywords char(255) NOT NULL default '', description char(100) NOT NULL default '', crc32 int(11) NOT NULL default '0', lang char(32) NOT NULL default '', charset char(32) NOT NULL default '', PRIMARY KEY (rec_id), UNIQUE KEY url(url), KEY key_crc(crc32) ) TYPE=MERGE UNION=(www.url,dbms.url); 3. Create two indexer.conf's. Only task related command are shown here: www.conf: UseCRC32URLID yes DBAddr mysql://foo:bar@localhost/www/ Server http://www.apache.org/ dbms.conf: UseCRC32URLID yes DBAddr mysql://foo:bar@localhost/dbms/ Server http://www.apache.org/ Check an explanation of UseCRC32URLID indexer.conf command in create/mysql/url-raid.txt 4. Index both subsets: indexer www.conf indexer dbms.conf 5. Edit search.htm: DBAddr mysql://foo:bar@localhost/collection/ or DBAddr mysql://foo:bar@localhost/www/ or DBAddr mysql://foo:bar@localhost/dbms/ That's all.... Now you are able to search through three databases: "www", "dbms" and "collection". "www" and "dbms" are subsets and "collection" is whole database. Advantage of this method ------------------------ Quick search through subsections. Search does not this use JOIN between two tables "dict" and "url" with tag condition: SELECT <fields> FROM dict,url WHERE url.tag='xxx' AND dict.word='word'; This query is used instead: SELECT <fields> FROM dict WHERE word='word'; At the same time search through whole database "collection" shouldn't be slowly comparing with the same data from both subsets when the only one database is used without MERGE tables. Disadvantage ------------ As far as auto_increment values are independant for MERGE table parts, indexer have to generate unique URL id's itself. CRC32 is used for it. It is pretty unique, however according to our tests it gives about 250 non-unique pairs for 3.5 mln unique URLs. So, the only one URL will be found from a pair with the same URL_ID. ___________________________________________ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]
