[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar
Author: Dmitriy Kulikov
Email: 
Message:
The results are the same for both bases.

mysql> use mnogosearch_new;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE '^[a-z0-9?#_]*$' 
LIMIT 30;
+--+--+
| word | hex(word)  
  |
+--+--+
| 000в| 303030C390C2B2 
  |
| 099в| 303939C390C2B2 
  |
| 107рѕ  | 313037C391E282ACC391E280A2 
  |
| 10млн | 3130C390C2BCC390C2BBC390C2BD   
  |
| 11в | 3131C390C2B2   
  |
| 18в | 3138C390C2B2   
  |
| 1970Ñ…   | 31393730C391E280A6 
  |
| 1980г   | 31393830C390C2B3   
  |
| 1в  | 31C390C2B2 
  |
| 1Ñ€  | 31C391E282AC   
  |
| 2001г   | 32303031C390C2B3   
  |
| 2002рі | 32303032C391E282ACC391E28093   
  |
| 2004г   | 32303034C390C2B3   
  |
| 2006г   | 32303036C390C2B3   
  |
| 2008г   | 32303038C390C2B3   
  |
| 2009г   | 32303039C390C2B3   
  |
| 2009рі | 32303039C391E282ACC391E28093   
  |
| 2011г   | 32303131C390C2B3   
  |
| 2012рі | 32303132C391E282ACC391E28093   
  |
| 20Ñ | 3230C391C281
 |
| 30летних   | 
3330C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C391E280A6 |
| 3летний| 
33C390C2BBC390C2B5C391E2809AC390C2BDC390C2B8C390C2B9 |
| 40в | 3430C390C2B2   
  |
| 41в | 3431C390C2B2   
  |
| 48в | 3438C390C2B2   
  |
| 599в| 353939C390C2B2 
  |
| 59в | 3539C390C2B2   
  |
| 600в| 363030C390C2B2 
  |
| 60в | 3630C390C2B2   
  |
| 90Ñ… | 3930C391E280A6 
  |
+--+--+
30 rows in set (0,00 sec)



mysql> use mnogosearch;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE '^[a-z0-9?#_]*$' 
LIMIT 30;
+--+--+
| word | hex(word)|
+--+--+
| 000в| 303030D0B2   |
| 099в| 303939D0B2   |
| 107рѕ  | 313037D180D195   |
| 10млн | 3130D0BCD0BBD0BD |
| 11в | 3131D0B2 |
| 18в | 3138D0B2 |
| 1970Ñ…   | 31393730D185 |
| 1980г   | 31393830D0B3 |
| 1в  | 31D0B2   |
| 1Ñ€  | 31D180   |
| 2001г   | 32303031D0B3 |
| 2002рі | 32303032D180D196 |
| 2004г   | 32303034D0B3 |
| 2006г   | 32303036D0B3 |
| 2008г   | 32303038D0B3 |
| 2009г   | 32303039D0B3  

[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar
Author: Alexander Barkov
Email: b...@mnogosearch.org
Message:
Can you try this one:

SELECT word, hex(word) FROM bdict WHERE word NOT RLIKE '^[a-z0-9?#_]*$' LIMIT 
30;

The idea is to get words with Cyrillic letters and see
their HEX representation.



> I got "Empty set" for both databases.
> 
> mysql> use mnogosearch_new;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
> Database changed
> mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
> Empty set (0,02 sec)
> 
> 
> mysql> use mnogosearch;
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
> Database changed
> mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
> Empty set (0,02 sec)
> 

Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar
Author: Dmitriy Kulikov
Email: 
Message:
I got "Empty set" for both databases.

mysql> use mnogosearch_new;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
Empty set (0,02 sec)


mysql> use mnogosearch;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
Empty set (0,02 sec)


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: mnogosearch-3.4.1 on FreeBSD 10.3

2016-06-01 Thread bar
Author: Dmitriy Kulikov
Email: 
Message:
Thank you!

The problem with search.cgi was really because of the changed format search.htm
But I have problems with encodings (e.g. Cyrillic windows-1251 or UTF-8).
I installed both versions of mnogosearch with separate bases, but with the same 
settings.
The old version works fine, but the new one has problems.

Encoding settings:
indexer.conf
  RemoteCharset windows-1251
  LocalCharset UTF-8

search.htm
  string BrowserCharset= "windows-1251";
  string LocalCharset= "UTF-8";


1) The New version requires that the base encoding by default coincided with 
LocalCharset:
ALTER DATABASE `mnogosearch_new` DEFAULT CHARACTER SET utf8 COLLATE 
utf8_unicode_ci;

Otherwise, you get the message in stderr:
An error occurred!
DB: MySQL driver: #1267: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) 
and (utf8_general_ci,COERCIBLE) for operation '='


2) With the same settings in  indexer.conf and search.htm  the search in the 
Cyrillic is not working in the new version of mnogosearch.
Setting of BrowserCharset= "UTF-8" does not change anything.

Your search - "агент" - did not match any documents.

Debug log:
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmFind
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Prepare
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  Prepare
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWords
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWordsDB for 
mysql://mnogosearch_new:***@localhost/mnogosearch_new/?dbmode=blob=UTF-8
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start loading limits
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} WHERE limit loaded. 149 URLs 
found
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  loading limits 
 0.01 (149 URLs found)
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching words
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start search for 
'агенСM-^B'
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  FindWordsDB:   
 0.01
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmQueryConvert
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  UdmQueryConvert:   
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Excerpts
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  Excerpts:  
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start WordInfo
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  WordInfo:  
 0.00
May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop  UdmFind:   
 0.01


3) When searching for words in the Latin, the base gives the text fragments in 
the correct Cyrillic, but the header of each retrieved document is always 
issued in the wrong encoding:
navigator : 405 
Results 1-10 of 99 ( 0.021 seconds)
?“?»?°?°??   [ 15.095% Popularity: 0.89705 ]
... сети Интернет по адресу: http://navigator***.ru Прежде чем приобрести ...



I would be very grateful for help with solving the last two problems.

Generally, when we install programs, they have the possibility of issuing 
various warning messages.
It would be nice if a new version of mnogosearch will warn about occurred 
serious changes.
I set up our old CMS to the new server and there are possible experiments. But 
if a new version of mnogosearch will installed as one of the updates to the 
server under working loads, then there would be a complete disaster.



Regarding to a long hang of mnogosearch indexing.
I found that this is due to the very slow network retrieval of large PDF 
documents.
I tried to set minimum limits of timeouts, but it does not help.
MaxNetErrors 10
ReadTimeOut 10s
DocTimeOut 30s

For example, I tried to set a time limit of 300s indexing, but indexing took 
1360s. Moreover, the document was not indexed.
/usr/local/bin/indexer -ob -v6 -N 1 -c 300 
/usr/local/etc/mnogosearch/indexer.conf 2> /var/log/mnogosearch.log
--
Done (1360 seconds, 1 documents, 11049522 bytes,  7.93 Kbytes/sec.)

I sent you the log of attempt of indexing this one document.

When I set: 
Disallow *.pdf
indexing is fast.

Why is setting of time limits doesn't help? How can avoid such lockups of the 
indexing process?


Reply: 

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general