Hello Teijo,
On 03/24/2017 01:24 AM, Teijo wrote:
> Hello,
>
> If I search given word with search.cgi, I get correct number of occurences.
>
> But if I do it with SQL (no matter in mysql or sqlite3), they show extra
> occurence. For example, if a given word is in a given original file
> twice, they tell that there are three occurences. SQL query is almost
> the same one found in Mnogosearch's manual, except that I am using only
> one word:
>
> SELECT url.url, count(*) AS RANK FROM dict, url WHERE
> url.rec_id=dict.url_id AND dict.word IN ('word') GROUP BY url.url ORDER
> BY rank DESC;
>
> I'd like to know (by SQL query) position of word in the original file
> (to use filepos function). There is at least coord column in dict table.
> Coord contains section id and word's position in relationship to
> section, if I have understood correctly. How to extract the relative
> position from coord, or is the position information elsewhere in
> database? If I disabled all sections, would coord actually contain the
> absolute position?
>
> I'm using "single mode" as to database.
Coord is a 32 bit number.
- The highest 8 bits are section ID (e.g. title, body, etc,
according to Section commands in indexer.conf)
- The lowest 24 bits are position inside this section.
- The last hit inside each combination (url_id,word,secno) is the
section length (i.e. the total number of words in this section on)
in this document.
This MySQL query return the information in a readable form:
SELECT url_id,word,coord>>24 AS secno,coord&0xFFFFFF AS pos FROM dict
WHERE word='mnogosearch' ORDER BY secno,pos;
+--------+-------------+-------+-----+
| url_id | word | secno | pos |
--------+-------------+-------+-----+
| 1 | mnogosearch | 1 | 1 |
| 1 | mnogosearch | 1 | 14 |
| 1 | mnogosearch | 1 | 28 |
| 1 | mnogosearch | 1 | 42 |
| 1 | mnogosearch | 1 | 76 |
| 1 | mnogosearch | 1 | 77 |
| 1 | mnogosearch | 1 | 85 |
| 1 | mnogosearch | 1 | 105 | <- section 1 length
| 1 | mnogosearch | 2 | 1 |
| 1 | mnogosearch | 2 | 6 | <- section 2 length
| 1 | mnogosearch | 3 | 54 |
| 1 | mnogosearch | 3 | 69 | <- section 3 length
| 1 | mnogosearch | 4 | 1 |
| 1 | mnogosearch | 4 | 11 | <- section 4 length
| 1 | mnogosearch | 8 | 2 |
| 1 | mnogosearch | 8 | 4 | <- section 8 length
+--------+-------------+-------+-----+
Lines that are not marked as "section X length" are actual word hits.
>
> Best regards,
>
> Teijo
> _______________________________________________
> General mailing list
> [email protected]
> http://lists.mnogosearch.org/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://lists.mnogosearch.org/listinfo/general