Re: [Analytics] Analytics Digest, Vol 48, Issue 10

Bo Han Thu, 04 Feb 2016 12:25:50 -0800

> Date: Thu, 4 Feb 2016 08:22:01 +0100
> From: "Federico Leva (Nemo)" <[email protected]>
> To: A mailing list for the Analytics Team at WMF and everybody who has
>         an interest in Wikipedia and "analytics."
>         <[email protected]>
> Subject: Re: [Analytics] Pagecounts dumps page title UTF-8 escaping
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Bo Han, 04/02/2016 00:40:
>> Is the logic for the escaping available somewhere?
>
> MediaWiki API does https://phabricator.wikimedia.org/T29849
> For the new pageviews API I got this reply on Unicode normalisation:
> https://phabricator.wikimedia.org/T44259#1351880
>
> (Phabricator is down right now; wait a couple hours or check
> web.archive.org.)
>
> Nemo


Thanks for the reply Nemo. I read over the two links but am still a
little confused about the case for "Мстители (фильм, 2012)" on domain
ru, which is escaped as:
"%D0%9C%D1%81%D1%82%D0%B8%D1%82%D0%B5%D0%BB%D0%B8_%28%D1%84%D0%B8%D0%BB%D1%8C%D0%BC,_2012%29"
(everything but comma escaped)
"%D0%9C%D1%81%D1%82%D0%B8%D1%82%D0%B5%D0%BB%D0%B8_(%D1%84%D0%B8%D0%BB%D1%8C%D0%BC,_2012)"
(everything but comma+parens escaped)
"Мстители_(фильм,_2012)" (nothing escaped)

Shouldn't the comma and parens be escaped as well, or is there a
special case for reserved characters? If so, why are parens sometimes
escaped and sometimes not? Maybe some of the variation has to do with
how browsers encode/send the request?

Bo

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Analytics Digest, Vol 48, Issue 10

Reply via email to