Author: Alexander Barkov
Email: [email protected]
Message:
> Thank you!
>
> The problem with search.cgi was really because of the changed format
> search.htm
> But I have problems with encodings (e.g. Cyrillic windows-1251 or UTF-8).
> I installed both versions of mnogosearch with separate bases, but with the
> same settings.
> The old version works fine, but the new one has problems.
>
> Encoding settings:
> indexer.conf
> RemoteCharset windows-1251
> LocalCharset UTF-8
>
> search.htm
> string BrowserCharset= "windows-1251";
> string LocalCharset= "UTF-8";
>
Please start investigating the problem from checking data
in the database. It's important to make sure that indexer
collects data in true utf8.
What does this query return:
SELECT word, hex(word) FROM bdict WHERE word RLIKE '^[^a-z]$' LIMIT 30;
?
>
> 1) The New version requires that the base encoding by default coincided with
> LocalCharset:
> ALTER DATABASE `mnogosearch_new` DEFAULT CHARACTER SET utf8 COLLATE
> utf8_unicode_ci;
>
> Otherwise, you get the message in stderr:
> An error occurred!
> DB: MySQL driver: #1267: Illegal mix of collations
> (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
>
>
> 2) With the same settings in indexer.conf and search.htm the search in the
> Cyrillic is not working in the new version of mnogosearch.
> Setting of BrowserCharset= "UTF-8" does not change anything.
>
> Your search - "агент" - did not match any documents.
>
> Debug log:
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmFind
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Prepare
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop Prepare
> 0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWords
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start FindWordsDB for
> mysql://mnogosearch_new:***@localhost/mnogosearch_new/?dbmode=blob&SetNames=UTF-8
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start loading limits
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} WHERE limit loaded. 149
> URLs found
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop loading limits
> 0.01 (149 URLs found)
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching words
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start search for
> 'агенСM-^B'
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start fetching
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop FindWordsDB:
> 0.01
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start UdmQueryConvert
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop UdmQueryConvert:
> 0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start Excerpts
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop Excerpts:
> 0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Start WordInfo
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop WordInfo:
> 0.00
> May 31 22:31:10 *** search.cgi[79240]: [79240]{--} Stop UdmFind:
> 0.01
>
>
> 3) When searching for words in the Latin, the base gives the text fragments
> in the correct Cyrillic, but the header of each retrieved document is always
> issued in the wrong encoding:
> navigator : 405
> Results 1-10 of 99 ( 0.021 seconds)
> ?“?»?°?????°?? [ 15.095% Popularity: 0.89705 ]
> ... сети Интернет по адресу: http://navigator***.ru Прежде чем приобрести ...
>
>
>
> I would be very grateful for help with solving the last two problems.
>
> Generally, when we install programs, they have the possibility of issuing
> various warning messages.
> It would be nice if a new version of mnogosearch will warn about occurred
> serious changes.
> I set up our old CMS to the new server and there are possible experiments.
> But if a new version of mnogosearch will installed as one of the updates to
> the server under working loads, then there would be a complete disaster.
>
>
>
> Regarding to a long hang of mnogosearch indexing.
> I found that this is due to the very slow network retrieval of large PDF
> documents.
> I tried to set minimum limits of timeouts, but it does not help.
> MaxNetErrors 10
> ReadTimeOut 10s
> DocTimeOut 30s
>
> For example, I tried to set a time limit of 300s indexing, but indexing took
> 1360s. Moreover, the document was not indexed.
> /usr/local/bin/indexer -ob -v6 -N 1 -c 300
> /usr/local/etc/mnogosearch/indexer.conf 2> /var/log/mnogosearch.log
> ------------------
> Done (1360 seconds, 1 documents, 11049522 bytes, 7.93 Kbytes/sec.)
>
> I sent you the log of attempt of indexing this one document.
>
> When I set:
> Disallow *.pdf
> indexing is fast.
>
> Why is setting of time limits doesn't help? How can avoid such lockups of the
> indexing process?
>
Reply: <http://www.mnogosearch.org/board/message.php?id=21777>
_______________________________________________
General mailing list
[email protected]
http://lists.mnogosearch.org/listinfo/general