----- Original Message -----
From: "Jan Karabina" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, March 16, 2001 10:12 AM
Subject: [aseek-users] extra unicode tables (fwd)


> Hi
> I send extra unicode tables (four czech encodings: DOS cp852, kamenicky
> brothers, macce, koi8cs) for aspseek110.
> All tables are taken from sherlock search engine (i wrote before), so
> they are correct, i hope :)

Thanks. Could you give us exact names of charsets for these files ?

>
> I send czech stopwords file in iso88592 (aspseek has cz stopwords only
> in ASCII) in addition.
>
> It is nice that aspseek110 support unicode but what about langmaps?

    You can specify langmap file in CharsetTableU1 command

> It is stupid to have several langmaps for one language (one langmap for
> each encoding)

    I don't understand it completely, now we can specify only one langmap
for charset.

> Can you solve it like with stopwords?
>
> --
> Jan Karabina  mailto:[EMAIL PROTECTED]
>


----------------------------------------------------------------------------
----


# Stopwords for Czech language (cz).
# Taken from MnogoSearch 3.1.8 distribution and modified to suit ASPSeek
format


# Subject: my distr. + stopword(lang, word)
# Date: Wed, 10 Nov 1999 14:59:39 +0100 (CET)
# From: Builder <[EMAIL PROTECTED]>
# To: [EMAIL PROTECTED]
#
# Hallo,
#
# I'm sending you a stopword(lang, word) for Czech, it's base on
# frequent analyze of about 10 000 pages. But this is only in ASCII charset.
#
# I ([EMAIL PROTECTED]) converted it to iso8859-2 by hand.

dnes
cz
tímto
bude¹
budem
byli
jse¹
mùj
svým
ta
tomto
tohle
tuto
tyto
jej
zda
proè
máte
tato
kam
tohoto
kdo
kteøí
mi
nám
tom
tomuto
mít
nic
proto
kterou
byla
toho
proto¾e
asi
ho
na¹i
napi¹te
re
co¾
tím
tak¾e
svých
její
svými
jste
aj
tu
tedy
teto
bylo
kde
ke
pravé
ji
nad
nejsou
èi
pod
téma
mezi
pøes
ty
pak
vám
ani
kdy¾
v¹ak
ne
jsem
tento
èlánku
èlánky
aby
jsme
pøed
pta
jejich
byl
je¹tì
a¾
bez
také
pouze
první
va¹e
která
nás
novy
tipy
pokud
mu¾e
design
strana
jeho
své
jiné
zprávy
nové
není
vás
jen
podle
zde

u¾
email
byt
více
bude
ji¾
ne¾
který
by
které
co
nebo
ten
tak
má
pøi
od
po
jsou
jak
dal¹í
ale
si
ve
to
jako
za
zpìt
ze
do
pro
je
na



----------------------------------------------------------------------------
----


> # PC Latin-2 Charset File (DOS cp852)
> #
> # Adapted from sherlock search engine
> # by Jan Karabina <[EMAIL PROTECTED]>
>
> 00E1 00DF 0000 0000    #LATIN SMALL LETTER SHARP S
> 00D0 00F0 00D1 00D0    #LATIN LETTER ETH
>
> 00A0 00E1 00B5 00C1    #LATIN LETTER A WITH ACUTE
> 00C7 0103 00C6 0102    #LATIN LETTER A WITH BREVE
> 0083 00E2 00B6 00C2    #LATIN LETTER A WITH CIRCUMFLEX
> 0084 00E4 008E 00C4    #LATIN LETTER A WITH DIAERESIS
> 00A5 0105 00A4 0104    #LATIN LETTER A WITH OGONEK
> 0086 0107 008F 0106    #LATIN LETTER C WITH ACUTE
> 009F 010D 00AC 010C    #LATIN LETTER C WITH CARON
> 0087 00E7 0080 00C7    #LATIN LETTER C WITH CEDILLA
> 00D4 010F 00D2 010E    #LATIN LETTER D WITH CARON
> 0082 00E9 0090 00C9    #LATIN LETTER E WITH ACUTE
> 00D8 011B 00B7 011A    #LATIN LETTER E WITH CARON
> 0089 00EB 00D3 00CB    #LATIN LETTER E WITH DIAERESIS
> 00A9 0119 00A8 0118    #LATIN LETTER E WITH OGONEK
> 00A1 00ED 00D6 00CD    #LATIN LETTER I WITH ACUTE
> 008C 00EE 00D7 00CE    #LATIN LETTER I WITH CIRCUMFLEX
> 0092 013A 0091 0139    #LATIN LETTER L WITH ACUTE
> 0096 013E 0095 013D    #LATIN LETTER L WITH CARON
> 0088 0142 009D 0141    #LATIN LETTER L WITH STROKE
> 00A2 00F3 00E0 00D3    #LATIN LETTER O WITH ACUTE
> 0093 00F4 00E2 00D4    #LATIN LETTER O WITH CIRCUMFLEX
> 0094 00F6 0099 00D6    #LATIN LETTER O WITH DIAERESIS
> 008B 0151 008A 0150    #LATIN LETTER O WITH DOUBLE ACUTE
> 0098 015B 0097 015A    #LATIN LETTER S WITH ACUTE
> 00AD 015F 00B8 015E    #LATIN LETTER S WITH CEDILLA
> 009C 0165 009B 0164    #LATIN LETTER T WITH CARON
> 00A3 00FA 00E9 00DA    #LATIN LETTER U WITH ACUTE
> 0081 00FC 009A 00DC    #LATIN LETTER U WITH DIAERESIS
> 0085 016F 00DE 016E    #LATIN LETTER U WITH RING ABOVE
> 00AB 017A 008D 0179    #LATIN LETTER Z WITH ACUTE
> 00A7 017E 00A6 017D    #LATIN LETTER Z WITH CARON
> 00BE 017C 00BD 017B    #LATIN LETTER Z WITH DOT ABOVE
>


----------------------------------------------------------------------------
----


> # Kamenicky Brothers Charset File
> # Czech characters
> #
> # Adapted from sherlock search engine
> # by Jan Karabina <[EMAIL PROTECTED]>
>
> 00A0 00E1 008F 00C1   #LATIN LETTER A WITH ACUTE
> 0084 00E4 008E 00C4   #LATIN LETTER A WITH DIAERESIS
> 0087 010D 0080 010C   #LATIN LETTER C WITH CARON
> 0083 010F 0085 010E   #LATIN LETTER D WITH CARON
> 0082 00E9 0090 00C9   #LATIN LETTER E WITH ACUTE
> 0088 011B 0089 011A   #LATIN LETTER E WITH CARON
> 00A1 00ED 008B 00CD   #LATIN LETTER I WITH ACUTE
> 008D 013A 008A 0139   #LATIN LETTER L WITH ACUTE
> 008C 013E 009C 013D   #LATIN LETTER L WITH CARON
> 00A4 0148 00A5 0147   #LATIN LETTER N WITH CARON
> 00A2 00F3 0095 00D3   #LATIN LETTER O WITH ACUTE
> 0093 00F4 00A7 00D4   #LATIN LETTER O WITH CIRCUMFLEX
> 0094 00F6 0099 00D6   #LATIN LETTER O WITH DIAERESIS
> 00AA 0155 00AB 0154   #LATIN LETTER R WITH ACUTE
> 00A9 0159 009E 0158   #LATIN LETTER R WITH CARON
> 00A8 0161 009B 0160   #LATIN LETTER S WITH CARON
> 009F 0165 0086 0164   #LATIN LETTER T WITH CARON
> 00A3 00FA 0097 00DA   #LATIN LETTER U WITH ACUTE
> 0081 00FC 009A 00DC   #LATIN LETTER U WITH DIAERESIS
> 0096 016F 00A6 016E   #LATIN LETTER U WITH RING ABOVE
> 0098 00FD 009D 00DD   #LATIN LETTER Y WITH ACUTE
> 0091 017E 0092 017D   #LATIN LETTER Z WITH CARON
>


----------------------------------------------------------------------------
----


> # KOI-8 CS Charset File
> #
> # Adapted from sherlock search engine
> # by Jan Karabina <[EMAIL PROTECTED]>
> #C7     F002    ????
> #E7     F000    ????
>
>
> 00C1 00E1 00E1 00C1    #LATIN LETTER A WITH ACUTE
> 00D1 00E4 00F8 0102    #LATIN LETTER A WITH DIAERESIS
> 00D8 00E0 00F1 00C4    #LATIN LETTER A WITH GRAVE
> 00C3 010D 00E3 010C    #LATIN LETTER C WITH CARON
> 00C4 010F 00E4 010E    #LATIN LETTER D WITH CARON
> 00D7 00E9 00F7 00C9    #LATIN LETTER E WITH ACUTE
> 00C5 011B 00E5 011A    #LATIN LETTER E WITH CARON
> 00C9 00ED 00E9 00CD    #LATIN LETTER I WITH ACUTE
> 00CB 013A 00EB 0139    #LATIN LETTER L WITH ACUTE
> 00CC 013E 00EC 013D    #LATIN LETTER L WITH CARON
> 00CE 0148 00EE 0147    #LATIN LETTER N WITH CARON
> 00CF 00F3 00EF 00D3    #LATIN LETTER O WITH ACUTE
> 00D0 00F4 00F0 00D4    #LATIN LETTER O WITH CIRCUMFLEX
> 00CD 00F6 00ED 00D6    #LATIN LETTER O WITH DIAERESIS
> 00C6 0155 00E6 0154    #LATIN LETTER R WITH ACUTE
> 00D2 0159 00F2 0158    #LATIN LETTER R WITH CARON
> 00D3 0161 00F3 0160    #LATIN LETTER S WITH CARON
> 00D4 0165 00F4 0164    #LATIN LETTER T WITH CARON
> 00D5 00FA 00F5 00DA    #LATIN LETTER U WITH ACUTE
> 00C8 00FC 00E8 00DC    #LATIN LETTER U WITH DIAERESIS
> 00CA 016F 00EA 016E    #LATIN LETTER U WITH RING ABOVE
> 00D9 00FD 00F9 00DD    #LATIN LETTER Y WITH ACUTE
> 00DA 017E 00FA 017D    #LATIN LETTER Z WITH CARON
>


----------------------------------------------------------------------------
----


> # Czech Macintosh Charset File
> #
> # Adapted from sherlock search engine
> # by Jan Karabina <[EMAIL PROTECTED]>
>
> 00A7 00DF 0000 0000    #LATIN SMALL LETTER SHARP S
>
> 0087 00E1 00E7 00C1    #LATIN LETTER A WITH ACUTE
> 008A 00E4 0080 00C4    #LATIN LETTER A WITH DIAERESIS
> 0082 0101 0081 0100    #LATIN LETTER A WITH MACRON
> 0088 0105 0084 0104    #LATIN LETTER A WITH OGONEK
> 008D 0107 008C 0106    #LATIN LETTER C WITH ACUTE
> 008B 010D 0089 010C    #LATIN LETTER C WITH CARON
> 0093 010F 0091 010E    #LATIN LETTER D WITH CARON
> 008E 00E9 0083 00C9    #LATIN LETTER E WITH ACUTE
> 009E 011B 009D 011A    #LATIN LETTER E WITH CARON
> 0098 0117 0096 0116    #LATIN LETTER E WITH DOT ABOVE
> 0095 0113 0094 0112    #LATIN LETTER E WITH MACRON
> 00AB 0119 00A2 0118    #LATIN LETTER E WITH OGONEK
> 00AE 01F5 00FE 0122    #LATIN LETTER G WITH ACUTE
> 0092 00ED 00EA 00CD    #LATIN LETTER I WITH ACUTE
> 00B4 012B 00B1 012A    #LATIN LETTER I WITH MACRON
> 00B0 012F 00AF 012E    #LATIN LETTER I WITH OGONEK
> 00FA 0137 00B5 0136    #LATIN LETTER K WITH CEDILLA
> 00BE 013A 00BD 0139    #LATIN LETTER L WITH ACUTE
> 00BC 013E 00BB 013D    #LATIN LETTER L WITH CARON
> 00BA 013C 00B9 013B    #LATIN LETTER L WITH CEDILLA
> 00B8 0142 00FC 0141    #LATIN LETTER L WITH STROKE
> 00C4 0144 00C1 0143    #LATIN LETTER N WITH ACUTE
> 00CB 0148 00C5 0147    #LATIN LETTER N WITH CARON
> 00C0 0146 00BF 0145    #LATIN LETTER N WITH CEDILLA
> 0097 00F3 00EE 00D3    #LATIN LETTER O WITH ACUTE
> 0099 00F4 00EF 00D4    #LATIN LETTER O WITH CIRCUMFLEX
> 009A 00F6 0085 00D6    #LATIN LETTER O WITH DIAERESIS
> 00CE 0151 00CC 0150    #LATIN LETTER O WITH DOUBLE ACUTE
> 00D8 014D 00CF 014C    #LATIN LETTER O WITH MACRON
> 009B 00F5 00CD 00D5    #LATIN LETTER O WITH TILDE
> 00DA 0155 00D9 0154    #LATIN LETTER R WITH ACUTE
> 00DE 0159 00DB 0158    #LATIN LETTER R WITH CARON
> 00E0 0157 00DF 0156    #LATIN LETTER R WITH CEDILLA
> 00E6 015B 00E5 015A    #LATIN LETTER S WITH ACUTE
> 00E4 0161 00E1 0160    #LATIN LETTER S WITH CARON
> 00E9 0165 00E8 0164    #LATIN LETTER T WITH CARON
> 009C 00FA 00F2 00DA    #LATIN LETTER U WITH ACUTE
> 009F 00FC 0086 00DC    #LATIN LETTER U WITH DIAERESIS
> 00F5 0171 00F4 0170    #LATIN LETTER U WITH DOUBLE ACUTE
> 00F0 016B 00ED 016A    #LATIN LETTER U WITH MACRON
> 00F7 0173 00F6 0172    #LATIN LETTER U WITH OGONEK
> 00F3 016F 00F1 016E    #LATIN LETTER U WITH RING ABOVE
> 00F9 00FD 00F8 00DD    #LATIN LETTER Y WITH ACUTE
> 0090 017A 008F 0179    #LATIN LETTER Z WITH ACUTE
> 00EC 017E 00EB 017D    #LATIN LETTER Z WITH CARON
> 00FD 017C 00FB 017B    #LATIN LETTER Z WITH DOT ABOVE
>
>

Reply via email to