Re: [General] Extra hit with SQL query and word position in the original file
Hello Teijo, On 03/24/2017 01:24 AM, Teijo wrote: > Hello, > > If I search given word with search.cgi, I get correct number of occurences. > > But if I do it with SQL (no matter in mysql or sqlite3), they show extra > occurence. For example, if a given word is in a given original file > twice, they tell that there are three occurences. SQL query is almost > the same one found in Mnogosearch's manual, except that I am using only > one word: > > SELECT url.url, count(*) AS RANK FROM dict, url WHERE > url.rec_id=dict.url_id AND dict.word IN ('word') GROUP BY url.url ORDER > BY rank DESC; > > I'd like to know (by SQL query) position of word in the original file > (to use filepos function). There is at least coord column in dict table. > Coord contains section id and word's position in relationship to > section, if I have understood correctly. How to extract the relative > position from coord, or is the position information elsewhere in > database? If I disabled all sections, would coord actually contain the > absolute position? > > I'm using "single mode" as to database. Coord is a 32 bit number. - The highest 8 bits are section ID (e.g. title, body, etc, according to Section commands in indexer.conf) - The lowest 24 bits are position inside this section. - The last hit inside each combination (url_id,word,secno) is the section length (i.e. the total number of words in this section on) in this document. This MySQL query return the information in a readable form: SELECT url_id,word,coord>>24 AS secno,coord&0xFF AS pos FROM dict WHERE word='mnogosearch' ORDER BY secno,pos; ++-+---+-+ | url_id | word| secno | pos | +-+---+-+ | 1 | mnogosearch | 1 | 1 | | 1 | mnogosearch | 1 | 14 | | 1 | mnogosearch | 1 | 28 | | 1 | mnogosearch | 1 | 42 | | 1 | mnogosearch | 1 | 76 | | 1 | mnogosearch | 1 | 77 | | 1 | mnogosearch | 1 | 85 | | 1 | mnogosearch | 1 | 105 | <- section 1 length | 1 | mnogosearch | 2 | 1 | | 1 | mnogosearch | 2 | 6 | <- section 2 length | 1 | mnogosearch | 3 | 54 | | 1 | mnogosearch | 3 | 69 | <- section 3 length | 1 | mnogosearch | 4 | 1 | | 1 | mnogosearch | 4 | 11 | <- section 4 length | 1 | mnogosearch | 8 | 2 | | 1 | mnogosearch | 8 | 4 | <- section 8 length ++-+---+-+ Lines that are not marked as "section X length" are actual word hits. > > Best regards, > > Teijo > ___ > General mailing list > General@mnogosearch.org > http://lists.mnogosearch.org/listinfo/general ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general
[General] Extra hit with SQL query and word position in the original file
Hello, If I search given word with search.cgi, I get correct number of occurences. But if I do it with SQL (no matter in mysql or sqlite3), they show extra occurence. For example, if a given word is in a given original file twice, they tell that there are three occurences. SQL query is almost the same one found in Mnogosearch's manual, except that I am using only one word: SELECT url.url, count(*) AS RANK FROM dict, url WHERE url.rec_id=dict.url_id AND dict.word IN ('word') GROUP BY url.url ORDER BY rank DESC; I'd like to know (by SQL query) position of word in the original file (to use filepos function). There is at least coord column in dict table. Coord contains section id and word's position in relationship to section, if I have understood correctly. How to extract the relative position from coord, or is the position information elsewhere in database? If I disabled all sections, would coord actually contain the absolute position? I'm using "single mode" as to database. Best regards, Teijo ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general
Re: [General] Indexing problem with sqlite3
Hello, You are correct. I made the change to the wrong branch. The patch fixed the problem at least in Jessie, and I suppose that is the case with Ubuntu as well. Thank you very much! Best regards, Teijo 23.3.2017, 17:23, Alexander Barkov kirjoitti: Hello, On 03/22/2017 08:51 PM, Teijo wrote: Hello, Unfortunately patch did not solve the problem. As to SQLite3 versions, Ubuntu 16.04 it is SQLite version 3.11.0 2016-02-15 17:29:24 and in Jessie SQLite version 3.8.7.1 2014-10-29 13:59:56 There are two similar places in sql-sqlite.c Please make sure to fix the SQLite3 (rather than SQLite2) code branch: case SQLITE_ERROR: sqlite3_finalize(pStmt); udm_snprintf(db->errstr, sizeof(db->errstr), "sqlite3 driver: (%d) %s", sqlite3_errcode(UdmSQLite3Conn(db)), sqlite3_errmsg(UdmSQLite3Conn(db))); if (!strstr(db->errstr,"unique") && !strstr(db->errstr,"UNIQUE")) { UdmSetErrorCode(db, 1); return UDM_ERROR; } return UDM_OK; break; Best regards, Teijo 22.3.2017, 16:52, Alexander Barkov kirjoitti: Hello Teijo, SQLite changed the error message in one of the recent releases, from "unique" in lower case to "UNIQUE" in upper case. Please apply this patch to src/sql-sqlite.c: -if (!strstr(db->errstr,"unique")) +if (!strstr(db->errstr,"unique") && !strstr(db->errstr,"UNIQUE")) On 03/22/2017 06:39 PM, Alexander Barkov wrote: Hello Teijo, On 03/22/2017 03:44 PM, Teijo wrote: Hello, I have installed Mnogosearch 3.4.1 from source both to Ubuntu 16.04 and Debian Jessie. In Ubuntu I cannot use Mysql as database because there seem to be some compatibility issues with Mysql 5.7. In Jessie where Mysql version is 5.5x there are no such problems. I thought to use Sqlite3 in Ubuntu. Database setup goes without errors with indexer --create. But when I try to make index with simply typing indexer, I get similar to the following: [33572]{--} indexer from mnogosearch-3.4.1-sqlite3 started with '/usr/local/mnogosearch/etc/indexer.conf' [33572]{01} Error: 'DB: sqlite3 driver: (19) UNIQUE constraint failed: url.url' There seem to be similar problems with Sqlite3 in Jessie as well. I am not familiar with Mnogosearch and Sqlite3 so is there something I have missed when setting up the environment? Only changes I have made in indexer.conf are Dbaddress and server definitions. Dbaddress is just that it's in the example of Sqlite3 definition in indexer.conf-dist. Which exact version of SQLite are you using? Can you please send your indexer.conf and the output for: ./indexer --sqlmon --exec="SELECT rec_id, url FROM url" to b...@mnogosearch.org Thanks. Best regards, Teijo ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general
Re: [General] Indexing problem with sqlite3
Hello, On 03/22/2017 08:51 PM, Teijo wrote: > Hello, > > Unfortunately patch did not solve the problem. > > As to SQLite3 versions, Ubuntu 16.04 it is > SQLite version 3.11.0 2016-02-15 17:29:24 > and in Jessie > SQLite version 3.8.7.1 2014-10-29 13:59:56 There are two similar places in sql-sqlite.c Please make sure to fix the SQLite3 (rather than SQLite2) code branch: case SQLITE_ERROR: sqlite3_finalize(pStmt); udm_snprintf(db->errstr, sizeof(db->errstr), "sqlite3 driver: (%d) %s", sqlite3_errcode(UdmSQLite3Conn(db)), sqlite3_errmsg(UdmSQLite3Conn(db))); if (!strstr(db->errstr,"unique") && !strstr(db->errstr,"UNIQUE")) { UdmSetErrorCode(db, 1); return UDM_ERROR; } return UDM_OK; break; > > Best regards, > > Teijo > > 22.3.2017, 16:52, Alexander Barkov kirjoitti: > >> Hello Teijo, >> >> >> SQLite changed the error message in one of the recent releases, >> from "unique" in lower case to "UNIQUE" in upper case. >> >> >> Please apply this patch to src/sql-sqlite.c: >> >> >> >> -if (!strstr(db->errstr,"unique")) >> +if (!strstr(db->errstr,"unique") && >> !strstr(db->errstr,"UNIQUE")) >> >> >> >> >> >> >> On 03/22/2017 06:39 PM, Alexander Barkov wrote: >>> Hello Teijo, >>> >>> >>> On 03/22/2017 03:44 PM, Teijo wrote: Hello, I have installed Mnogosearch 3.4.1 from source both to Ubuntu 16.04 and Debian Jessie. In Ubuntu I cannot use Mysql as database because there seem to be some compatibility issues with Mysql 5.7. In Jessie where Mysql version is 5.5x there are no such problems. I thought to use Sqlite3 in Ubuntu. Database setup goes without errors with indexer --create. But when I try to make index with simply typing indexer, I get similar to the following: [33572]{--} indexer from mnogosearch-3.4.1-sqlite3 started with '/usr/local/mnogosearch/etc/indexer.conf' [33572]{01} Error: 'DB: sqlite3 driver: (19) UNIQUE constraint failed: url.url' There seem to be similar problems with Sqlite3 in Jessie as well. I am not familiar with Mnogosearch and Sqlite3 so is there something I have missed when setting up the environment? Only changes I have made in indexer.conf are Dbaddress and server definitions. Dbaddress is just that it's in the example of Sqlite3 definition in indexer.conf-dist. >>> >>> Which exact version of SQLite are you using? >>> >>> >>> Can you please send your indexer.conf and the output for: >>> >>> ./indexer --sqlmon --exec="SELECT rec_id, url FROM url" >>> >>> to b...@mnogosearch.org >>> >>> Thanks. >>> >>> >>> Best regards, Teijo ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general >>> ___ >>> General mailing list >>> General@mnogosearch.org >>> http://lists.mnogosearch.org/listinfo/general >>> > ___ > General mailing list > General@mnogosearch.org > http://lists.mnogosearch.org/listinfo/general ___ General mailing list General@mnogosearch.org http://lists.mnogosearch.org/listinfo/general