Re: [xwiki-devs] [vote] XWiki charset strategy

Paul Libbrecht Sat, 16 Feb 2008 12:13:42 -0800


Le 16 févr. 08 à 09:00, Sergiu Dumitriu a écrit :

that mysql has to be manually configured for UTF-8, as by default it
comes in latin1.


Isn't this a problem with databases which are configured in ISO8859-1
by default most of the time?


Yes, it is. Right now there is a component somewhere that converts
characters not supported by the encoding into &#xxx; escapes, but I
can't remember which. With these escapes, the database always receives
data in the encoding XWiki is configured with.

This is the kind of escapes that have to go away, I feel, they clutter the whole place, you never know if you're not doing it twice, and they slow everything down.

What I would really like is if hibernate was smart enough to enforce
encodings. Or to transparently encode data between the application and
the database. Unfortunately, that's not the case.
I'll have to check which encodings do DBMSs have implicitly. I only know that mysql comes with latin1, and I think hsql and derby come with utf.


Are you serious that such a need is there ?

We've been using mostly derby but at times we used mysql.... and with the following property and no config on a default (fink-installed) mysql that worked:

<property name="connection.url">jdbc:mysql://dbserver/ activemath?useUnicode=true&characterEncoding=UTF-8</property>


Adding this in the installation instruction seems doable, or ?

Maybe best would be to have a small test-application that allows each and everyone to test it.

Same question for the servlet container.


The servlet container does not (usually) have an encoding.

There's one that's been kept implicit for too long but is now commonly written in server.xml: the charset used in URLs and www-form- url-encoded post-content. Tomcat has long considered the platform encoding to be correct here, but this is clearly wrong. Again, some made some workarounds....

It works in the system encoding, which varies from OS and country. Windows systems usually are set to an encoding that reflects the language/country, and Linux systems mostly do the same, but tend to switch to UTF8.

Macs have yet another gang (e.g. MacRoman instead of latin1), that's also varying per language.

I checked what happens if I override the jvm encoding. It's not good, as it is replaced for all the apps, and in a shared container that's really
bad.

Well... there again the problem is even more stringent. Contmeporary Apache deliver, per default, html files as having the charset of the apache!! (no joke!) Moreover, several specs state clearly that the content (e.g. html meta tags or xml headers) should not override a charset declared in the mime-type.

The platform charset is such a variable parameters that I no no applications that would make sense of this... except maybe those that manipulate plain text for use in such as notepad....

It's as simple as that: if you want to be on the web you need to think globally and thus you need a universal encoding.

Thus, I'm against overriding the jvm encoding. This then makes
option 2 impossible to implement, unless we decide to make XWiki
products work only in certain environments.


I think that everyone has the problem.

It will be possible to do this in several years, once people forget all about different charsets. Sometimes, decisions made in early stages are so hard to overcome and completely eliminate in later stages.

Still, we can't work with reduced charsets anymore.

add to it: Math symbols (and greek letters and symbols and...) cannot live within an 8 bits encoding, whichever it is.


But the task seems big... as you describe below.

paul

The tough part is that there are some tools that handle conversions
internally, and they work with the JVM encoding. We have such problems
with JRCS (rollbacks replace non-ascii chars with question marks), and
with FOP (the same question marks appear). I'll have to study what can
be done to overcome these problems.

[...]

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] [vote] XWiki charset strategy

Reply via email to