Arjan Lamers wrote: > database searches for example can produce akward results. Just to warn you > for this sort of 'dirty hacks'. I dont know the internals of mmbase that > well, so i don't know if there are any other problems with charsets in > mmbase.
I am aware of the risques. My higher goal is though that in any case you must ensure that the java strings are correct. Actually, the current situation was that the database contains something 'impossible' already which resulted in 'incorrect' java strings. My goal is to offer the possibility to mimimize the incorrectness to the database. I agree that generally you should never want something impossible anywhere. > There are actually to problems regarding websites and charsets: one is > having a database which is in a limited encoding (such as iso8859-1 when > you also want to support cp1252), the other is discovering what encoding > the browser is using. If you fix one you'll also need to fix the other one > as you already mentioned. Yes, I agree, but this hack is only about the database layer, which should IMHO always assume that it receives 'correct' strings. If that in some case is not true yet, then that must be fixed _too_. > IMHO the best solution for the database kind of problem is to create a new > JDBC driver, and not to polute mmbase itself with workarounds for a > wrongly encoded database. For the browser encoding problem a few work I think I don't want to create a new JDBC driver... The closest thing is de Storage Layer in MMBase. Actually this whole hack is only about a few lines of code there. Certainly a whole lot less then a completely new JDBC driver... > arounds exist (mainly by forcing utf-8). If I understand your solution > correctly you are trying to compensate a bad interpreted request from > a browser and a bad encoded database inside the JVM, resulting in > unpredictable db queries and string length's for multibyte encodings. No, on the contrary. I want to make sure that requests are never badly interpreted, but offer a kind of work around if you until now undepended on that. My mantra is that java strings must be correct in _any_ case, even if for some legacy reason the database isn't. Of course, if you make a new setup, I'd allways recommend to arrange some unicode-capable backend with UTF-8 pages on the front-end. > > Then, I propose the possibility to provide 'surrogators' on database > > level. A surrogator is a something which translates 'impossible > > characters' to something which comes close enough but is not the real > > thing. E.g. it can replace the euro-sign with the word 'EURO'. > > IBM has a product ICU4j which does exactly this sort of thing. It too can > convert unsupported chararcters to alternative representations, but also a > lot more: http://www.icu4j.org/ > I don't know if their open source license is comaptible with mmbase's > license, but it > seems to me it is worth investigating. Thanks that's very interesting, and will keep it mind. For the moment I need only surrogating of those 27 odd cp1252 characters, for which I simply made a very straightforward filter. Michiel -- Michiel Meeuwissen mihxil' Mediacentrum 140 H'sum [] () +31 (0)35 6772979 nl_NL eo_XX en_US _______________________________________________ Developers mailing list Developers@lists.mmbase.org http://lists.mmbase.org/mailman/listinfo/developers