Yep, but bytecounts as an approximate for information density or content size are themselves not terribly useful (mobile web or desktop, two different bytecounts, same content). The goal with this is more, I think, to enable a reference point to work out "okay, what version of the article were these people prrrobably looking at, and what did it look like?"
On 20 March 2015 at 00:18, Gergo Tisza <[email protected]> wrote: > On Mon, Mar 16, 2015 at 3:14 PM, Oliver Keyes <[email protected]> wrote: >> >> Kevin: I'm not sure what value there'd be. I mean, there's page-size, >> maybe? But pageID gives us that (or should). > > > Time-traveling with MediaWiki is very hard. Calculating the length of > wikitext for a given pageID at a given time is cumbersome (instead of simple > text processing, you are now dealing with DB queries, need to set up a local > DB mirror etc). Finding out what title it had at the time is prohibitively > hard (you have to parse semi-structured objects which are serialized into > strings in the log table and follow the chain of renames). Finding out the > byte size of the rendered HTML is practically impossible (templates and > interface messages change; flagged revisions/pending changes might result in > older versions of articles being shown). If you omit the bytecounts, there > is no way people will be able to reconstruct them from the logs. > > Not saying that's a problem - I personally don't see much use for them. Just > don't expect pageID to be very useful for "normalizing" logs. > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
