Hello, I thought xwiki was using UTF-8 by default?! Because in that case, we _only_ need to check the "environment", which means every supported database and containers, and maybe the system language, not sure it matters btw. I can take care of such a documentation if you want. I've heard that postgreSQL uses utf-8 by default, but need to study it anyway.
2013/2/11 Vincent Massol <[email protected]> > > On Dec 11, 2012, at 12:43 AM, Sergiu Dumitriu <[email protected]> wrote: > > > On 12/07/2012 04:56 PM, Caleb James DeLisle wrote: > >> > >> > >> On 12/07/2012 04:26 PM, Vincent Massol wrote: > >>> Hi, > >>> > >>> On Dec 7, 2012, at 9:59 PM, Sergiu Dumitriu <[email protected]> wrote: > >>> > >>>> Hi devs, > >>>> > >>>> We've moved more and more toward an UTF-8-only application, and XWiki > >>>> has only been tested with this configuration for several years. > >>>> > >>>> I propose that we require UTF-8 for a valid, supported installation. > >>>> This means: > >>>> - JVM encoding (-Dfile.encoding=UTF8) > >>>> - Container default URL encoding (Tomcat has ISO-8859-1 by default) > >>>> - Database encoding (MySql is still configured with latin1 on some > distros) > >>>> > >>>> There's one big site to update on our side: xwiki.org. > >>>> > >>>> Here's my +1. This is a move toward a future web, since more and more > >>>> standards require (or at least assume as a default) UTF-8. > >>>> > >>>> > >>>> > >>>> After thinking a bit more, it would make sense to require a valid > >>>> Unicode encoding, including UTF-16, which is preferable in countries > >>>> that don't use a latin alphabet. However, XWiki doesn't currently work > >>>> under 16-bit encodings at all. > >>> > >>> For XWiki 4.x I'm -1 since it's a big change and we don't want to > break our users that currently use 4.x with ISO8859-1 for example > >>> > >>> For XWiki 5.x I'm not sure. > >>> > >>> To be able to answer I need to understand more. For example what > currently doesn't work with any encoding the user wants to use? Shouldn't > we just be transparent and use whatever encoding is specified and not > hardcode anything? > >> > >> +1 for UTF-8 only. > >> > >> If we want to support an encoding we need to run our test suite with it > so > >> each encoding we support multiplies the test run time and it's not > going to > >> bring features to the user's hands. > >> > >> +1 for waiting until 5.x at least before making it mandatory because we > will > >> have to require MySQL >= 5.5.3 and set the encoding to utf8mb4 in order > to > >> avoid errors when saving pages with 4 byte codepoints. > >> http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html > > > > I'm afraid we'll get errors if we do that, since indexes are still > > limited to a total of 1024 bytes, and we're already maxing out with > > 255-varchar columns + other fields. In short, MySQL sucks for serious > > projects, but we can't really tell our users "use Postgres, it's > > better". So I'd rather keep it to the current utf-8, and hope that > > nobody will need the extended unicode planes, until we find a better > > solution. > > > > To be more specific: we can't switch to 4-byte utf8 until we stop using > > names as primary key elements. > > > > Just tried it, and indeed trying to save characters outside the BMP will > > fail. Thanks for pointing this out. > > > >> I understand that some users currently set the encoding to latin1 so > MySQL > >> will just treat the data as opaque blobs. > > > > Except that it doesn't work like that. If you use latin1, you'll get > > errors with the default XE xar about invalid values in the RCS table. > > The connector doesn't send bytes, it sends characters, and the database > > will try to store them, which it can't. Every piece of MySQL has an > > encoding, which isn't opaque. Pushing characters outside the table's > > charset will trigger an exception. > > Reviving this thread now that 5.0 dev is going to start. > > xwiki.org is still running latin1 AFAIK and it's working well, including > for page history so I'm not sure what the problem is. > > Now I'm fine to require UTF8. It would be nice to check the environment at > startup. I hope we can do so. This means checking that DB and container are > set up correctly. This is important also for existing users who are using > latin1. They need to know they have something to do. xwiki.org is a good > example. We should also document how users can migrate their DBs to UTF8 in > our admin guide on xwiki.org. > > Does it mean we'll remove (deprecate to start with?) the xwiki.encoding > config parameter? > > Thanks > -Vincent > > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs > _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

