Hello,

I thought xwiki was using UTF-8 by default?! Because in that case, we
_only_ need to check the "environment", which means every supported
database and containers, and maybe the system language, not sure it matters
btw.
I can take care of such a documentation if you want. I've heard that
postgreSQL uses utf-8 by default, but need to study it anyway.


2013/2/11 Vincent Massol <[email protected]>

>
> On Dec 11, 2012, at 12:43 AM, Sergiu Dumitriu <[email protected]> wrote:
>
> > On 12/07/2012 04:56 PM, Caleb James DeLisle wrote:
> >>
> >>
> >> On 12/07/2012 04:26 PM, Vincent Massol wrote:
> >>> Hi,
> >>>
> >>> On Dec 7, 2012, at 9:59 PM, Sergiu Dumitriu <[email protected]> wrote:
> >>>
> >>>> Hi devs,
> >>>>
> >>>> We've moved more and more toward an UTF-8-only application, and XWiki
> >>>> has only been tested with this configuration for several years.
> >>>>
> >>>> I propose that we require UTF-8 for a valid, supported installation.
> >>>> This means:
> >>>> - JVM encoding (-Dfile.encoding=UTF8)
> >>>> - Container default URL encoding (Tomcat has ISO-8859-1 by default)
> >>>> - Database encoding (MySql is still configured with latin1 on some
> distros)
> >>>>
> >>>> There's one big site to update on our side: xwiki.org.
> >>>>
> >>>> Here's my +1. This is a move toward a future web, since more and more
> >>>> standards require (or at least assume as a default) UTF-8.
> >>>>
> >>>>
> >>>>
> >>>> After thinking a bit more, it would make sense to require a valid
> >>>> Unicode encoding, including UTF-16, which is preferable in countries
> >>>> that don't use a latin alphabet. However, XWiki doesn't currently work
> >>>> under 16-bit encodings at all.
> >>>
> >>> For XWiki 4.x I'm -1 since it's a big change and we don't want to
> break our users that currently use 4.x with ISO8859-1 for example
> >>>
> >>> For XWiki 5.x I'm not sure.
> >>>
> >>> To be able to answer I need to understand more. For example what
> currently doesn't work with any encoding the user wants to use? Shouldn't
> we just be transparent and use whatever encoding is specified and not
> hardcode anything?
> >>
> >> +1 for UTF-8 only.
> >>
> >> If we want to support an encoding we need to run our test suite with it
> so
> >> each encoding we support multiplies the test run time and it's not
> going to
> >> bring features to the user's hands.
> >>
> >> +1 for waiting until 5.x at least before making it mandatory because we
> will
> >> have to require MySQL >= 5.5.3 and set the encoding to utf8mb4 in order
> to
> >> avoid errors when saving pages with 4 byte codepoints.
> >> http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
> >
> > I'm afraid we'll get errors if we do that, since indexes are still
> > limited to a total of 1024 bytes, and we're already maxing out with
> > 255-varchar columns + other fields. In short, MySQL sucks for serious
> > projects, but we can't really tell our users "use Postgres, it's
> > better". So I'd rather keep it to the current utf-8, and hope that
> > nobody will need the extended unicode planes, until we find a better
> > solution.
> >
> > To be more specific: we can't switch to 4-byte utf8 until we stop using
> > names as primary key elements.
> >
> > Just tried it, and indeed trying to save characters outside the BMP will
> > fail. Thanks for pointing this out.
> >
> >> I understand that some users currently set the encoding to latin1 so
> MySQL
> >> will just treat the data as opaque blobs.
> >
> > Except that it doesn't work like that. If you use latin1, you'll get
> > errors with the default XE xar about invalid values in the RCS table.
> > The connector doesn't send bytes, it sends characters, and the database
> > will try to store them, which it can't. Every piece of MySQL has an
> > encoding, which isn't opaque. Pushing characters outside the table's
> > charset will trigger an exception.
>
> Reviving this thread now that 5.0 dev is going to start.
>
> xwiki.org is still running latin1 AFAIK and it's working well, including
> for page history so I'm not sure what the problem is.
>
> Now I'm fine to require UTF8. It would be nice to check the environment at
> startup. I hope we can do so. This means checking that DB and container are
> set up correctly. This is important also for existing users who are using
> latin1. They need to know they have something to do. xwiki.org is a good
> example. We should also document how users can migrate their DBs to UTF8 in
> our admin guide on xwiki.org.
>
> Does it mean we'll remove (deprecate to start with?) the xwiki.encoding
> config parameter?
>
> Thanks
> -Vincent
>
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
>
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to