Hi!
Kreso wrote:
Hello all,
what would be the recommended way of specifying the language in
which the indexed documents are written? I have noticed in indexer.c
that Content-Language: header is examined, however I would prefer
specifying the language somewhere in the document itself
Author: Alex Barkov
Email: [EMAIL PROTECTED]
Message:
Is it really no Chinese Language support?
If not at this moment, when will it support?
There is not Chinese support in releases before
3.2.0 really. Since mnogosearch-3.2.0 it has Big5
and GB2312 Chinese character sets support.
More
Author: maxime
Email:
Message:
No. expectation and dispersion was used to avoid sorting, i guess (i don't know
exactly, as it is not my idea). Indexes was used to limit memory usage. Yes, it give a
little worst result against all n-grams, but guesser work well and fast and not
comsume much
Author: maxime
Email:
Message:
No. expectation and dispersion was used to avoid sorting, i guess (i don't know
exactly, as it is not my idea). Indexes was used to limit memory usage. Yes, it give a
little worst result against all n-grams, but guesser work well and fast and not
comsume much
Author: D. I.
Email: [EMAIL PROTECTED]
Message:
Where can I find any information about chinese dialects on net or your site, if you
please?
Reply: http://www.mnogosearch.org/board/message.php?id=4329
___
If you want to unsubscribe send unsubscribe
Author: Gialuca
Email: [EMAIL PROTECTED]
Message:
Hi,
I'll try the new version. About the substitution I mean that 'e' and 'wil' have the
same index (as 'g' and ' I ') and, since there isn't collision handling, that keys
share the same value. So if your text is 'Since I think I will be alive.
Author: Gialuca
Email: [EMAIL PROTECTED]
Message:
Hi all,
we did further research on language guessing and during it compared mguesser to
text_cat. It appears that mguesser doesn't handle collisions, accepting maps in which
'g' is substituted by ' I ' or where 'wil' by 'e' or viceversa. Did you
Author: maxime
Email:
Message:
Since 3.2.4 version we use different measure based on information gain function. You
may build new mguesser from current CVS sources.
What you mean under 'g' is substituted by ' I ' or where 'wil' by 'e' ?
Reply:
Author: Gialuca
Email: [EMAIL PROTECTED]
Message:
Yes, your right, but I saw, and cavnar and trenkle say that, that very first entries
are just single letters, so you're just getting letter freqs, and that's the reason to
believe a pass-band filter could be useful. Thanks anyway for your
Author: maxime
Email:
Message:
May be not. Compaire maps for various languages - equal 1-gramms have different
frequencies for different languages.
Reply: http://www.mnogosearch.org/board/message.php?id=4103
___
If you want to unsubscribe send
Author: Gialuca
Email: [EMAIL PROTECTED]
Message:
Hi,
I and my company are doing some research on language guessing, and we are using
mnogosearch at some levels, including its guesser.
I'd have a question about the language maps costruction: why did you use a filter
cutting only the least
Author: maxime
Email:
Message:
Because _top_ n-gramms highly language specific. And middle n-grams may be equal for
related languages (ex. russian, ukranian, byelorussian).
N.B. our guesser based on this papper:
http://sochi.net.ru/~maxime/doc/cavnar_trenkle_ngram.ps.gz
Reply: http
Author: kentsin
Email: [EMAIL PROTECTED]
Message:
FYI, Wired.com just have an article about using gzip to do language guessing.
http://wired.com/news/technology/0,1282,50192,00.html
Reply: http://www.mnogosearch.org/board/message.php?id=4095
___
If you
HI,
I was wondering if you have yet begun work
on
Make it possible to use several "LocalCharset"
indexer.conf commands.
It should help to index multi-language servers such
as www.debian.org.
This is the most important feature for my work
since i am constantly indexing mult
Author: Alex Barkov
Email: [EMAIL PROTECTED]
Message:
multi and single modes support substring searches.
Default template contains a SELECT with OPTIONs to
choose word match type: full, beginning, ending,
substring.
Hello, all,
Sample, can I searching string admin*,
and result will pages
Author: mike jaffa
Email: [EMAIL PROTECTED]
Message:
Why though does the indexer not recognise the language even though it recognises the
charset. I have read the documentation and found nothing which tells me how to switch
language detection on.
I assume it is automatic but it does not work
We are sending you the stopwords list for catalan language. We hope that
it will be included in next distribution and it will be useful for
catalan people.
We are using the mngosearch for indexing a city council web site in
Catalonia.
Congratulations for your fantastic work!!!
The Cthulhu
Author: loverman
Email: [EMAIL PROTECTED]
Message:
The best resolution of your problem for you is to translate your web-project to
different languages to make visitor choose the language.
Reply: http://www.mnogosearch.org/board/message.php?id=2996
Author: John Fax
Email: [EMAIL PROTECTED]
Message:
Hi,
Is there a way to let mnoGoSearch guess what is the language
of the document ?
If not, does anybody know a program that is able to perform
such a task ?
Thanks a lot !
John
Reply: http://www.mnogosearch.org/board/message.php?id=2980
Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
Hi,
Is there a way to let mnoGoSearch guess what is the language
of the document ?
If not, does anybody know a program that is able to perform
such a task ?
Thanks a lot !
There is also mguesser, a stand-alone part
Author: Sergio
Email: [EMAIL PROTECTED]
Message:
Hi, I am trying to index a site which is in 4 diff. languages.. the user chooses the
language on the splash page, then a cookie is set, and every page is shown in the
corrisponding language according to the cookie...
I would like to index
Author: Alexander Barkov
Email: [EMAIL PROTECTED]
Message:
Hi, I am trying to index a site which is in 4 diff. languages.. the user chooses the
language on the splash page, then a cookie is set, and every page is shown in the
corrisponding language according to the cookie...
I would like
Author: gluke
Email: [EMAIL PROTECTED]
Message:
DBAddr xxx
Server xxx
Localcharset koi8-r
Am I right?
Tanx in advance
Localcharset Should be set in indexer conf before all Server commands.
And to specify remote server chatset you should use Charset indexer command before
Server also.
Author: Volker Wysk
Email: post @volker-wysk.de
Message:
Hi
If you use Apache, you could use its content negotiation
features. See the manual.
bye
Reply: http://search.mnogo.ru/board/message.php?id=1778
___
If you want to unsubscribe send "unsubscribe
Author: Molara Federico
Email: [EMAIL PROTECTED]
Message:
How can I set the language for a HTML page?
I'm indexing a multi-language site of dinamically
generated pages (I'm using ASP).
I've tryed to insert a META language="xx" in my
pages, but it don't seems to work.
What's wrong??
Molara Federico wrote:
How can I set the language for a HTML page?
I'm indexing a multi-language site of dinamically
generated pages (I'm using ASP).
I've tryed to insert a META language="xx" in my
pages, but it don't seems to work.
What's wrong???
You should use lang
rules
for the same language? What happens if you import several?
You have to use the only one affix file to one language. But
it is possible to use several wordlists with this affix file.
Reply: http://search.mnogo.ru/board/message.php?id=1644
___
If you
for the same language? What happens if you import several?
bye
Reply: http://search.mnogo.ru/board/message.php?id=1636
___
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]
28 matches
Mail list logo