> Date: Thu, 4 Feb 2016 08:22:01 +0100 > From: "Federico Leva (Nemo)" <[email protected]> > To: A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and "analytics." > <[email protected]> > Subject: Re: [Analytics] Pagecounts dumps page title UTF-8 escaping > Message-ID: <[email protected]> > Content-Type: text/plain; charset=utf-8; format=flowed > > Bo Han, 04/02/2016 00:40: >> Is the logic for the escaping available somewhere? > > MediaWiki API does https://phabricator.wikimedia.org/T29849 > For the new pageviews API I got this reply on Unicode normalisation: > https://phabricator.wikimedia.org/T44259#1351880 > > (Phabricator is down right now; wait a couple hours or check > web.archive.org.) > > Nemo
Thanks for the reply Nemo. I read over the two links but am still a little confused about the case for "Мстители (фильм, 2012)" on domain ru, which is escaped as: "%D0%9C%D1%81%D1%82%D0%B8%D1%82%D0%B5%D0%BB%D0%B8_%28%D1%84%D0%B8%D0%BB%D1%8C%D0%BC,_2012%29" (everything but comma escaped) "%D0%9C%D1%81%D1%82%D0%B8%D1%82%D0%B5%D0%BB%D0%B8_(%D1%84%D0%B8%D0%BB%D1%8C%D0%BC,_2012)" (everything but comma+parens escaped) "Мстители_(фильм,_2012)" (nothing escaped) Shouldn't the comma and parens be escaped as well, or is there a special case for reserved characters? If so, why are parens sometimes escaped and sometimes not? Maybe some of the variation has to do with how browsers encode/send the request? Bo _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
