On Mon, Mar 16, 2015 at 3:14 PM, Oliver Keyes <[email protected]> wrote:

> Kevin: I'm not sure what value there'd be. I mean, there's page-size,
> maybe? But pageID gives us that (or should).


Time-traveling with MediaWiki is very hard. Calculating the length of
wikitext for a given pageID at a given time is cumbersome (instead of
simple text processing, you are now dealing with DB queries, need to set up
a local DB mirror etc). Finding out what title it had at the time is
prohibitively hard (you have to parse semi-structured objects which are
serialized into strings in the log table and follow the chain of renames).
Finding out the byte size of the rendered HTML is practically impossible
(templates and interface messages change; flagged revisions/pending changes
might result in older versions of articles being shown). If you omit the
bytecounts, there is no way people will be able to reconstruct them from
the logs.

Not saying that's a problem - I personally don't see much use for them.
Just don't expect pageID to be very useful for "normalizing" logs.
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to