[General] Extra hit with SQL query and word position in the original file

2017-03-23 Thread Teijo

Hello,

If I search given word with search.cgi, I get correct number of occurences.

But if I do it with SQL (no matter in mysql or sqlite3), they show extra 
occurence. For example, if a given word is in a given original file 
twice, they tell that there are three occurences. SQL query is almost 
the same one found in Mnogosearch's manual, except that I am using only 
one word:


SELECT url.url, count(*) AS RANK FROM dict, url WHERE 
url.rec_id=dict.url_id AND dict.word IN ('word') GROUP BY url.url ORDER 
BY rank DESC;


I'd like to know (by SQL query) position of word in the original file 
(to use filepos function). There is at least coord column in dict table. 
Coord contains section id and word's position in relationship to 
section, if I have understood correctly. How to extract the relative 
position from coord, or is the position information elsewhere in 
database? If I disabled all sections, would coord actually contain the 
absolute position?


I'm using "single mode" as to database.

Best regards,

Teijo
___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


Re: [General] Indexing problem with sqlite3

2017-03-22 Thread Teijo

Hello,

Unfortunately patch did not solve the problem.

As to SQLite3 versions, Ubuntu 16.04 it is
SQLite version 3.11.0 2016-02-15 17:29:24
and in Jessie
SQLite version 3.8.7.1 2014-10-29 13:59:56

Best regards,

Teijo

22.3.2017, 16:52, Alexander Barkov kirjoitti:


Hello Teijo,


SQLite changed the error message in one of the recent releases,
from "unique" in lower case to "UNIQUE" in upper case.


Please apply this patch to src/sql-sqlite.c:



-if (!strstr(db->errstr,"unique"))
+if (!strstr(db->errstr,"unique") && !strstr(db->errstr,"UNIQUE"))






On 03/22/2017 06:39 PM, Alexander Barkov wrote:

Hello Teijo,


On 03/22/2017 03:44 PM, Teijo wrote:

Hello,

I have installed Mnogosearch 3.4.1 from source both to Ubuntu 16.04 and
Debian Jessie.

In Ubuntu I cannot use Mysql as database because there seem to be some
compatibility issues with Mysql 5.7. In Jessie where Mysql version is
5.5x there are no such problems.

I thought to use Sqlite3 in Ubuntu. Database setup goes without errors
with indexer --create. But when I try to make index with simply typing
indexer, I get similar to the following:

[33572]{--} indexer from mnogosearch-3.4.1-sqlite3 started with
'/usr/local/mnogosearch/etc/indexer.conf'
[33572]{01} Error: 'DB: sqlite3 driver: (19) UNIQUE constraint failed:
url.url'

There seem to be similar problems with Sqlite3 in Jessie as well.

I am not familiar with Mnogosearch and Sqlite3 so is there something I
have missed when setting up the environment? Only changes I have made in
indexer.conf are Dbaddress and server definitions. Dbaddress is just
that it's in the example of Sqlite3 definition in indexer.conf-dist.


Which exact version  of SQLite are you using?


Can you please send your indexer.conf and the output for:

./indexer --sqlmon --exec="SELECT rec_id, url FROM url"

to b...@mnogosearch.org

Thanks.





Best regards,

Teijo
___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


Re: [General] Extra hit with SQL query and word position in the original file

2017-03-24 Thread Teijo

Hello,

Thank you very much for this information! I'm about to apply it to one 
of my subdomains.


Best regards,

Teijo

24.3.2017, 3:59, Alexander Barkov kirjoitti:


Hello Teijo,


On 03/24/2017 01:24 AM, Teijo wrote:

Hello,

If I search given word with search.cgi, I get correct number of occurences.

But if I do it with SQL (no matter in mysql or sqlite3), they show extra
occurence. For example, if a given word is in a given original file
twice, they tell that there are three occurences. SQL query is almost
the same one found in Mnogosearch's manual, except that I am using only
one word:

SELECT url.url, count(*) AS RANK FROM dict, url WHERE
url.rec_id=dict.url_id AND dict.word IN ('word') GROUP BY url.url ORDER
BY rank DESC;

I'd like to know (by SQL query) position of word in the original file
(to use filepos function). There is at least coord column in dict table.
Coord contains section id and word's position in relationship to
section, if I have understood correctly. How to extract the relative
position from coord, or is the position information elsewhere in
database? If I disabled all sections, would coord actually contain the
absolute position?

I'm using "single mode" as to database.


Coord is a 32 bit number.

- The highest 8 bits are section ID (e.g. title, body, etc,
   according to Section commands in indexer.conf)

- The lowest 24 bits are position inside this section.

- The last hit inside each combination (url_id,word,secno) is the
section length (i.e. the total number of words in this section on)
in this document.


This MySQL query return the information in a readable form:

SELECT url_id,word,coord>>24 AS secno,coord&0xFF AS pos FROM dict
WHERE word='mnogosearch' ORDER BY secno,pos;

++-+---+-+
| url_id | word| secno | pos |
+-+---+-+
|  1 | mnogosearch | 1 |   1 |
|  1 | mnogosearch | 1 |  14 |
|  1 | mnogosearch | 1 |  28 |
|  1 | mnogosearch | 1 |  42 |
|  1 | mnogosearch | 1 |  76 |
|  1 | mnogosearch | 1 |  77 |
|  1 | mnogosearch | 1 |  85 |
|  1 | mnogosearch | 1 | 105 | <- section 1 length
|  1 | mnogosearch | 2 |   1 |
|  1 | mnogosearch | 2 |   6 | <- section 2 length
|  1 | mnogosearch | 3 |  54 |
|  1 | mnogosearch | 3 |  69 | <- section 3 length
|  1 | mnogosearch | 4 |   1 |
|  1 | mnogosearch | 4 |  11 | <- section 4 length
|  1 | mnogosearch | 8 |   2 |
|  1 | mnogosearch | 8 |   4 | <- section 8 length
++-+---+-+


Lines that are not marked as "section X length" are actual word hits.




Best regards,

Teijo
___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


Re: [General] URL matches list as query string

2017-04-06 Thread Teijo

Hello,

I have URL (server) I have indexed; for example: www.example.com/files 
containing several documents. I would like to restrict search results 
only to documents which names are document1 and document2.


If ul parameter in query string contains only document1 or document2, 
but not both, search results are restricted to the corresponding 
document. But I have not found a way to get ul parameter in query string 
to be such one that restriction would contain both documents. I get no 
matches when trying to put both documents to query string although both 
documents contain word I'm searching for.


This is an example query string which does not work:

?q=test=all=beg=document1+document2

I have tried also to pass this (and other variants with different ul 
parameter) directly to search.cgi.


Best regards,

Teijo

5.4.2017, 13:58, Alexander Barkov kirjoitti:


Hi Teijo,

On 03/30/2017 05:03 PM, Teijo wrote:

Hello,

I have tried with multi selection list box and text edit field. In both
cases only one item is accepted. If I try with more than one, the rest
are omitted (multi selection) or your search did not match any documents
message is shown (when entered in the text edit field.

I do not know what to try next.


Can you clarify please what exactly you're doing.

How does the relevant HTML code look like,
and how does the URL look like after you submit.

Thanks.



Best regards,

Teijo
___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


Re: [General] URL matches list as query string

2017-04-07 Thread Teijo

Hello,

Multiple ul= parameters resolved the problem. Thank you!

Best regards,

Teijo

7.4.2017, 12:12, Alexander Barkov kirjoitti:


Hello,

On 04/07/2017 02:04 AM, Teijo wrote:

Hello,

I have URL (server) I have indexed; for example: www.example.com/files
containing several documents. I would like to restrict search results
only to documents which names are document1 and document2.

If ul parameter in query string contains only document1 or document2,
but not both, search results are restricted to the corresponding
document. But I have not found a way to get ul parameter in query string
to be such one that restriction would contain both documents. I get no
matches when trying to put both documents to query string although both
documents contain word I'm searching for.

This is an example query string which does not work:

?q=test=all=beg=document1+document2


Try multiple ul= parameters:

?q=test=all=beg=document1=document2



I have tried also to pass this (and other variants with different ul
parameter) directly to search.cgi.

Best regards,

Teijo

5.4.2017, 13:58, Alexander Barkov kirjoitti:


Hi Teijo,

On 03/30/2017 05:03 PM, Teijo wrote:

Hello,

I have tried with multi selection list box and text edit field. In both
cases only one item is accepted. If I try with more than one, the rest
are omitted (multi selection) or your search did not match any documents
message is shown (when entered in the text edit field.

I do not know what to try next.


Can you clarify please what exactly you're doing.

How does the relevant HTML code look like,
and how does the URL look like after you submit.

Thanks.



Best regards,

Teijo
___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general