On Tue, Jun 10, 2014 at 1:04 AM, Nuria Ruiz <[email protected]> wrote:

> >Just to narrow this down a little further from the DB server-side: the
> eventlogging tables do use utf-8, so the fix probably doesn't require
> laborious schema changes (if that's what you meant by changing database
> types).
> To follow the structure on mediawiki I think the easiest is to change db
> types from varchar to varbinary where utf-8 is being used. Please let us
> know if you do not think it is appropriate.
>

Ah, so long-term ecosystem consistency is also an aim. Sounds wise. I was
only commenting in case it could make the current python encoding fix
easier and faster.

Were it a new system without ties to MW I'd push for solving character set
issues properly with something like utf8mb4, depending on how you want to
read/sort the data, but without that luxury varbinary is fine.
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to