hello.
I've discovered several problems regarding charsets and langauges
set up in apache default config. Attached is the patch which attempts
to fix is. since I'm new to apache-dev I might have chagned something
which was put there on putpose, so please rewive the things, I await
your comments.
The same thing is done for 1.3 which is also attached.
I'll coment some issues briefly here:
diff -u -r1.13 httpd-std.conf.in
[..] the list is altered to reflect current order (almost)
-LanguagePriority en da nl et fr de el it ja ko no pl pt pt-br ltz ca es sv tw
+LanguagePriority en da nl et fr de el it ja ko no pl pt pt-br ltz ca es sv tw uk uk-UA
[..] I'm not sure who put there KOI8-RU but I'm certain KOI8-U is widely known
so i've moved .ua to refer to that. It may break some compatibility, though
KOI8-RU must be compatible with KOI8-U.
-AddCharset KOI8-ru .koi8-uk .ua
[..]
+AddCharset KOI8-ru .koi8-uk .koi8-ru
[..] the URL has changed
-# See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
+# See http://www.iana.org/assignments/character-sets
[..] as http://www.iana.org/assignments/character-sets notes, UTF-* are all
uppercase so I changed that. also there were duplicate charsets for utf[78]
+AddCharset UTF-7 .utf7
[..]
-AddCharset utf-7 .utf7
If you have any comments/suggestion pleace contact me.
--
"Nuclear war would really set back cable."
- Ted Turner
Index: apache-1.3//conf/httpd.conf-dist
===================================================================
RCS file: /home/cvspublic/apache-1.3/conf/httpd.conf-dist,v
retrieving revision 1.80
diff -u -r1.80 httpd.conf-dist
--- apache-1.3//conf/httpd.conf-dist 5 Mar 2002 16:19:12 -0000 1.80
+++ apache-1.3//conf/httpd.conf-dist 19 Aug 2002 16:06:05 -0000
@@ -700,7 +700,9 @@
# Note 2: The example entries below illustrate that in quite
# some cases the two character 'Language' abbreviation is not
# identical to the two character 'Country' code for its country,
- # E.g. 'Danmark/dk' versus 'Danish/da'.
+ # E.g. 'Danmark/dk' versus 'Danish/da'. or 'Ukraine/ua' versus
+ # 'Ukrainian/uk' (the latter is, sometimes, a source of confusion).
+
#
# Note 3: In the case of 'ltz' we violate the RFC by using a three char
# specifier. But there is 'work in progress' to fix this and get
@@ -712,7 +714,9 @@
# Portugese (pt) - Luxembourgeois* (ltz)
# Spanish (es) - Swedish (sv) - Catalan (ca) - Czech(cz)
# Polish (pl) - Brazilian Portuguese (pt-br) - Japanese (ja)
- # Russian (ru)
+ # Russian (ru) - Ukrainian (uk) Ukrainian in Ukraine (uk-UA) [mozilla is
+ # set up to use uk-UA as default in uk localisation packs, hopefully it'll
+ # be fixed (19.08.2002)]
#
AddLanguage da .dk
AddLanguage nl .nl
@@ -742,11 +746,15 @@
AddLanguage ru .ru
AddLanguage zh-tw .tw
AddLanguage tw .tw
+ AddLanguage uk .uk
+ AddLanguage uk-UA .uk-UA # see comments above
+
AddCharset Big5 .Big5 .big5
AddCharset WINDOWS-1251 .cp-1251
AddCharset CP866 .cp866
AddCharset ISO-8859-5 .iso-ru
AddCharset KOI8-R .koi8-r
+ AddCharset KOI8-U .koi8-u
AddCharset UCS-2 .ucs2
AddCharset UCS-4 .ucs4
AddCharset UTF-8 .utf8
@@ -758,7 +766,7 @@
# more or less alphabetized them here. You probably want to change this.
#
<IfModule mod_negotiation.c>
- LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv
tw
+ LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv
+tw uk uk-UA
</IfModule>
#
Index: httpd-2.0//docs/conf/httpd-std.conf.in
===================================================================
RCS file: /home/cvspublic/httpd-2.0/docs/conf/httpd-std.conf.in,v
retrieving revision 1.13
diff -u -r1.13 httpd-std.conf.in
--- httpd-2.0//docs/conf/httpd-std.conf.in 15 Jul 2002 20:17:25 -0000 1.13
+++ httpd-2.0//docs/conf/httpd-std.conf.in 19 Aug 2002 16:07:22 -0000
@@ -692,7 +692,8 @@
# Note 2: The example entries below illustrate that in some cases
# the two character 'Language' abbreviation is not identical to
# the two character 'Country' code for its country,
-# E.g. 'Danmark/dk' versus 'Danish/da'.
+# E.g. 'Danmark/dk' versus 'Danish/da' or 'Ukraine/ua' versus
+# 'Ukrainian/uk' (the latter is, sometimes, a source of confusion).
#
# Note 3: In the case of 'ltz' we violate the RFC by using a three char
# specifier. There is 'work in progress' to fix this and get
@@ -704,7 +705,9 @@
# Portugese (pt) - Luxembourgeois* (ltz)
# Spanish (es) - Swedish (sv) - Catalan (ca) - Czech(cz)
# Polish (pl) - Brazilian Portuguese (pt-br) - Japanese (ja)
-# Russian (ru) - Croatian (hr)
+# Russian (ru) - Croatian (hr) - Ukrainian (uk) Ukrainian in Ukraine (uk-UA)
+# [mozilla is set up to use uk-UA as default in uk localisation packs, hopefully
+# it'll be fixed (19.08.2002)]
#
AddLanguage da .dk
AddLanguage nl .nl
@@ -731,6 +734,8 @@
AddLanguage tw .tw
AddLanguage zh-tw .tw
AddLanguage hr .hr
+AddLanguage uk .uk
+AddLanguage uk-UA .uk-UA # see comments above
#
# LanguagePriority allows you to give precedence to some languages
@@ -739,7 +744,7 @@
# Just list the languages in decreasing order of preference. We have
# more or less alphabetized them here. You probably want to change this.
#
-LanguagePriority en da nl et fr de el it ja ko no pl pt pt-br ltz ca es sv tw
+LanguagePriority en da nl et fr de el it ja ko no pl pt pt-br ltz ca es sv tw uk uk-UA
#
# ForceLanguagePriority allows you to serve a result page rather than
@@ -764,7 +769,7 @@
# Commonly used filename extensions to character sets. You probably
# want to avoid clashes with the language extensions, unless you
# are good at carefully testing your setup after each change.
-# See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets for
+# See http://www.iana.org/assignments/character-sets for
# the official list of charset names and their respective RFCs
#
AddCharset ISO-8859-1 .iso8859-1 .latin1
@@ -780,26 +785,31 @@
AddCharset ISO-2022-KR .iso2022-kr .kis
AddCharset ISO-2022-CN .iso2022-cn .cis
AddCharset Big5 .Big5 .big5
-# For russian, more than one charset is used (depends on client, mostly):
+# For Russian, more than one charset is used (depends on client, mostly):
AddCharset WINDOWS-1251 .cp-1251 .win-1251
AddCharset CP866 .cp866
-AddCharset KOI8-r .koi8-r .koi8-ru
-AddCharset KOI8-ru .koi8-uk .ua
+AddCharset KOI8-r .koi8-r
+# both Russian and Ukrainian (probably other cyrillic-based languages)
+AddCharset KOI8-ru .koi8-uk .koi8-ru
+# widely-used Ukrainian encoding, RFC2319
+AddCharset KOI8-U .koi8-u .ua
+
+# Unicode
AddCharset ISO-10646-UCS-2 .ucs2
AddCharset ISO-10646-UCS-4 .ucs4
-AddCharset UTF-8 .utf8
+AddCharset UTF-7 .utf7
+AddCharset UTF-8 .utf8
+AddCharset UTF-16 .utf16
# The set below does not map to a specific (iso) standard
# but works on a fairly wide range of browsers. Note that
# capitalization actually matters (it should not, but it
# does for some browsers).
#
-# See ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
+# See http://www.iana.org/assignments/character-sets
# for a list of sorts. But browsers support few.
#
AddCharset GB2312 .gb2312 .gb
-AddCharset utf-7 .utf7
-AddCharset utf-8 .utf8
AddCharset big5 .big5 .b5
AddCharset EUC-TW .euc-tw
AddCharset EUC-JP .euc-jp