On Mon, 31 Jan 2005, Michiel Meeuwissen wrote:
[...]
> So, I'll fix that CP1252 is interpreted as such if the database is
> ISO-8859-1. That can never harm, because there are no ISO-8859-1
> characters which are not on the same place in CP1252. Now, you fetch
> 'correct' strings in any case.

I've seen some project which also did this sort of charset overriding in 
the database. As long as you have single byte charsets, this will work 
more or less. But when you'll start working on multibyte charsets, 
database searches for example can produce akward results. Just to warn you 
for this sort of 'dirty hacks'. I dont know the internals of mmbase that 
well, so i don't know if there are any other problems with charsets in 
mmbase.

There are actually to problems regarding websites and charsets: one is 
having a database which is in a limited encoding (such as iso8859-1 when 
you also want to support cp1252), the other is discovering what encoding 
the browser is using. If you fix one you'll also need to fix the other one 
as you already mentioned.

IMHO the best solution for the database kind of problem is to create a new 
JDBC driver, and not to polute mmbase itself with workarounds for a 
wrongly encoded database. For the browser encoding problem a few work 
arounds exist (mainly by forcing utf-8). If I understand your solution 
correctly you are trying to compensate a bad interpreted request from 
a browser and a bad encoded database inside the JVM, resulting in 
unpredictable db queries and string length's for multibyte encodings.

Again, I don't know the internals and I may be misunderstanding your HACK, 
so there may be other reasons.

> Then, I propose the possibility to provide 'surrogators' on database
> level. A surrogator is a something which translates 'impossible
> characters' to something which comes close enough but is not the real
> thing. E.g. it can replace the euro-sign with the word 'EURO'.

IBM has a product ICU4j which does exactly this sort of thing. It too can 
convert unsupported chararcters to alternative representations, but also a 
lot more: http://www.icu4j.org/
I don't know if their open source license is comaptible with mmbase's 
license, but it 
seems to me it is worth investigating.

_______________________________________________
Developers mailing list
[email protected]
http://lists.mmbase.org/mailman/listinfo/developers

Reply via email to