Re: language

2002-06-04 Thread Alexander Barkov
Hi! Kreso wrote: Hello all, what would be the recommended way of specifying the language in which the indexed documents are written? I have noticed in indexer.c that Content-Language: header is examined, however I would prefer specifying the language somewhere in the document itself

Webboard: No Chinese Language Support..

2002-03-11 Thread Alex Barkov
Author: Alex Barkov Email: [EMAIL PROTECTED] Message: Is it really no Chinese Language support? If not at this moment, when will it support? There is not Chinese support in releases before 3.2.0 really. Since mnogosearch-3.2.0 it has Big5 and GB2312 Chinese character sets support. More

Webboard: Again on language guessing

2002-03-10 Thread maxime
Author: maxime Email: Message: No. expectation and dispersion was used to avoid sorting, i guess (i don't know exactly, as it is not my idea). Indexes was used to limit memory usage. Yes, it give a little worst result against all n-grams, but guesser work well and fast and not comsume much

Webboard: Again on language guessing

2002-03-10 Thread maxime
Author: maxime Email: Message: No. expectation and dispersion was used to avoid sorting, i guess (i don't know exactly, as it is not my idea). Indexes was used to limit memory usage. Yes, it give a little worst result against all n-grams, but guesser work well and fast and not comsume much

Webboard: No Chinese Language Support..

2002-03-09 Thread D. I.
Author: D. I. Email: [EMAIL PROTECTED] Message: Where can I find any information about chinese dialects on net or your site, if you please? Reply: http://www.mnogosearch.org/board/message.php?id=4329 ___ If you want to unsubscribe send unsubscribe

Webboard: Again on language guessing

2002-03-08 Thread Gialuca
Author: Gialuca Email: [EMAIL PROTECTED] Message: Hi, I'll try the new version. About the substitution I mean that 'e' and 'wil' have the same index (as 'g' and ' I ') and, since there isn't collision handling, that keys share the same value. So if your text is 'Since I think I will be alive.

Webboard: Again on language guessing

2002-03-07 Thread Gialuca
Author: Gialuca Email: [EMAIL PROTECTED] Message: Hi all, we did further research on language guessing and during it compared mguesser to text_cat. It appears that mguesser doesn't handle collisions, accepting maps in which 'g' is substituted by ' I ' or where 'wil' by 'e' or viceversa. Did you

Webboard: Again on language guessing

2002-03-07 Thread maxime
Author: maxime Email: Message: Since 3.2.4 version we use different measure based on information gain function. You may build new mguesser from current CVS sources. What you mean under 'g' is substituted by ' I ' or where 'wil' by 'e' ? Reply:

Webboard: Research on language guessing

2002-02-07 Thread Gialuca
Author: Gialuca Email: [EMAIL PROTECTED] Message: Yes, your right, but I saw, and cavnar and trenkle say that, that very first entries are just single letters, so you're just getting letter freqs, and that's the reason to believe a pass-band filter could be useful. Thanks anyway for your

Webboard: Research on language guessing

2002-02-07 Thread maxime
Author: maxime Email: Message: May be not. Compaire maps for various languages - equal 1-gramms have different frequencies for different languages. Reply: http://www.mnogosearch.org/board/message.php?id=4103 ___ If you want to unsubscribe send

Webboard: Research on language guessing

2002-02-06 Thread Gialuca
Author: Gialuca Email: [EMAIL PROTECTED] Message: Hi, I and my company are doing some research on language guessing, and we are using mnogosearch at some levels, including its guesser. I'd have a question about the language maps costruction: why did you use a filter cutting only the least

Webboard: Research on language guessing

2002-02-06 Thread maxime
Author: maxime Email: Message: Because _top_ n-gramms highly language specific. And middle n-grams may be equal for related languages (ex. russian, ukranian, byelorussian). N.B. our guesser based on this papper: http://sochi.net.ru/~maxime/doc/cavnar_trenkle_ngram.ps.gz Reply: http

Webboard: Research on language guessing

2002-02-06 Thread kentsin
Author: kentsin Email: [EMAIL PROTECTED] Message: FYI, Wired.com just have an article about using gzip to do language guessing. http://wired.com/news/technology/0,1282,50192,00.html Reply: http://www.mnogosearch.org/board/message.php?id=4095 ___ If you

LAnguage support

2001-12-04 Thread costas
HI, I was wondering if you have yet begun work on Make it possible to use several "LocalCharset" indexer.conf commands. It should help to index multi-language servers such as www.debian.org. This is the most important feature for my work since i am constantly indexing mult

Webboard: Has system built-in language or keywords?

2001-11-22 Thread Alex Barkov
Author: Alex Barkov Email: [EMAIL PROTECTED] Message: multi and single modes support substring searches. Default template contains a SELECT with OPTIONs to choose word match type: full, beginning, ending, substring. Hello, all, Sample, can I searching string admin*, and result will pages

Webboard: Language not understood

2001-10-26 Thread mike jaffa
Author: mike jaffa Email: [EMAIL PROTECTED] Message: Why though does the indexer not recognise the language even though it recognises the charset. I have read the documentation and found nothing which tells me how to switch language detection on. I assume it is automatic but it does not work

stop-list for catalan language

2001-10-19 Thread Jordi Gay Sensat
We are sending you the stopwords list for catalan language. We hope that it will be included in next distribution and it will be useful for catalan people. We are using the mngosearch for indexing a city council web site in Catalonia. Congratulations for your fantastic work!!! The Cthulhu

Webboard: Problem with czech language

2001-09-01 Thread loverman
Author: loverman Email: [EMAIL PROTECTED] Message: The best resolution of your problem for you is to translate your web-project to different languages to make visitor choose the language. Reply: http://www.mnogosearch.org/board/message.php?id=2996

Webboard: Language Autodetection

2001-08-31 Thread John Fax
Author: John Fax Email: [EMAIL PROTECTED] Message: Hi, Is there a way to let mnoGoSearch guess what is the language of the document ? If not, does anybody know a program that is able to perform such a task ? Thanks a lot ! John Reply: http://www.mnogosearch.org/board/message.php?id=2980

Webboard: Language Autodetection

2001-08-31 Thread Alexander Barkov
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hi, Is there a way to let mnoGoSearch guess what is the language of the document ? If not, does anybody know a program that is able to perform such a task ? Thanks a lot ! There is also mguesser, a stand-alone part

Webboard: indexing a multi-language site

2001-08-26 Thread Sergio
Author: Sergio Email: [EMAIL PROTECTED] Message: Hi, I am trying to index a site which is in 4 diff. languages.. the user chooses the language on the splash page, then a cookie is set, and every page is shown in the corrisponding language according to the cookie... I would like to index

Webboard: indexing a multi-language site

2001-08-26 Thread Alexander Barkov
Author: Alexander Barkov Email: [EMAIL PROTECTED] Message: Hi, I am trying to index a site which is in 4 diff. languages.. the user chooses the language on the splash page, then a cookie is set, and every page is shown in the corrisponding language according to the cookie... I would like

Webboard: How to set the language to English?

2001-07-19 Thread gluke
Author: gluke Email: [EMAIL PROTECTED] Message: DBAddr xxx Server xxx Localcharset koi8-r Am I right? Tanx in advance Localcharset Should be set in indexer conf before all Server commands. And to specify remote server chatset you should use Charset indexer command before Server also.

Webboard: Page language recognition

2001-03-22 Thread Volker Wysk
Author: Volker Wysk Email: post @volker-wysk.de Message: Hi If you use Apache, you could use its content negotiation features. See the manual. bye Reply: http://search.mnogo.ru/board/message.php?id=1778 ___ If you want to unsubscribe send "unsubscribe

Webboard: Page language recognition

2001-03-17 Thread Molara Federico
Author: Molara Federico Email: [EMAIL PROTECTED] Message: How can I set the language for a HTML page? I'm indexing a multi-language site of dinamically generated pages (I'm using ASP). I've tryed to insert a META language="xx" in my pages, but it don't seems to work. What's wrong??

Re: Webboard: Page language recognition

2001-03-17 Thread Maxime Zakharov
Molara Federico wrote: How can I set the language for a HTML page? I'm indexing a multi-language site of dinamically generated pages (I'm using ASP). I've tryed to insert a META language="xx" in my pages, but it don't seems to work. What's wrong??? You should use lang

Webboard: multiple dictionaries for the same language?

2001-03-07 Thread Alexander Barkov
rules for the same language? What happens if you import several? You have to use the only one affix file to one language. But it is possible to use several wordlists with this affix file. Reply: http://search.mnogo.ru/board/message.php?id=1644 ___ If you

Webboard: multiple dictionaries for the same language?

2001-03-06 Thread Volker Wysk
for the same language? What happens if you import several? bye Reply: http://search.mnogo.ru/board/message.php?id=1636 ___ If you want to unsubscribe send "unsubscribe general" to [EMAIL PROTECTED]