The custom statistics data structure is not that complicated. I've already the SQL query that does the initial version (based on xwikircs for now)
create table l10n_stats_author as select year(xwr_date)*100+ month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by 1,2; delete from l10n_stats_author where l10n_month=201301 or l10n_month=201302; insert into l10n_stats_author (select year(xwr_date)*100+ month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where xwr_date >= '2012-12-01' group by 1,2); select * from l10n_stats_author where l10n_month=201301 or l10n_month=201302; Putting this in a scheduler job should not be complicated Ludovic 2013/1/24 Sergiu Dumitriu <[email protected]> > On 01/24/2013 12:03 AM, Ludovic Dubost wrote: > > Working on l10n change to custom mapping showed up an issue with the l10n > > history. > > We have a huge history table with 2,3M lines (as well as an activity > > stream) and we use the history table for extracting contributor > statistics. > > > > However looking at it more closely shows that the source of most data in > > the history table is useless. > > See here a grouping by dates showing that localisation of already close > to > > 2M lines in 6 month in 2012. > > > > | 201201 | 318346 | > > | 201202 | 311403 | > > | 201203 | 728703 | > > | 201204 | 271657 | > > | 201205 | 296384 | > > | 201206 | 120463 | > > > > Here is an example of history for an entry > > > > mysql> select > > xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff > > as unsigned), length(xwr_patch) from xwikircs where xwr_docid=596494851 > ; > > > +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ > > | xwr_author | xwr_version1 | xwr_version2 | xwr_date > > | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) | > > > +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ > > | XWiki.XWikiTranslator | 1 | 1 | 2010-02-23 > 11:38:27 > > | | 1 | 163 | > > | XWiki.XWikiTranslator | 2 | 1 | 2010-02-23 > 11:40:26 > > | | 1 | 330 | > > | XWiki.rbuj | 3 | 1 | 2010-03-04 > 00:24:24 > > | | 1 | 189 | > > | XWiki.rbuj | 4 | 1 | 2010-03-04 > 01:02:52 > > | | 1 | 223 | > > | XWiki.rbuj | 5 | 1 | 2010-07-30 > 01:12:58 > > | | 0 | 5026 | > > | XWiki.XWikiTranslator | 6 | 1 | 2012-01-23 > 11:40:25 > > | | 1 | 115 | > > | XWiki.XWikiTranslator | 7 | 1 | 2012-01-23 > 11:58:25 > > | | 1 | 115 | > > | XWiki.XWikiTranslator | 8 | 1 | 2012-01-23 > 13:31:52 > > | | 1 | 234 | > > | XWiki.XWikiTranslator | 8 | 2 | 2012-01-23 > 15:36:02 > > | Prepared | 1 | 115 | > > | XWiki.XWikiTranslator | 8 | 3 | 2012-01-24 > 02:01:12 > > | Prepared | 0 | 5670 | > > | XWiki.XWikiTranslator | 8 | 4 | 2012-01-25 > 02:04:00 > > | Prepared | 1 | 115 | > > | XWiki.XWikiTranslator | 8 | 5 | 2012-01-26 > 02:02:29 > > | Prepared | 1 | 115 | > > | XWiki.XWikiTranslator | 8 | 6 | 2012-01-27 > 02:01:18 > > | Prepared | 1 | 115 | > > | XWiki.XWikiTranslator | 8 | 7 | 2012-02-04 > 02:01:08 > > | Prepared | 1 | 115 | > > | XWiki.XWikiTranslator | 8 | 8 | 2012-02-07 > 02:02:03 > > | Prepared | 0 | 5670 | > > | XWiki.XWikiTranslator | 8 | 9 | 2012-02-08 > 02:01:20 > > | Prepared | 1 | 115 | > > | XWiki.XWikiTranslator | 8 | 10 | 2012-02-14 > 02:01:07 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 11 | 2012-02-28 > 02:01:32 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 12 | 2012-03-03 > 02:01:20 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 13 | 2012-03-14 > 02:01:50 > > | Prepared | 0 | 5671 | > > | XWiki.XWikiTranslator | 8 | 14 | 2012-03-14 > 18:23:06 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 15 | 2012-03-14 > 18:32:38 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 16 | 2012-03-14 > 18:37:17 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 17 | 2012-03-24 > 02:01:34 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 18 | 2012-03-26 > 16:40:35 > > | Prepared | 0 | 5671 | > > | XWiki.XWikiTranslator | 8 | 19 | 2012-03-28 > 02:01:23 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 20 | 2012-03-29 > 02:01:45 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 21 | 2012-04-11 > 02:01:36 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 22 | 2012-04-19 > 15:20:10 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 24 | 2012-04-29 > 02:01:45 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 25 | 2012-05-04 > 18:55:30 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 26 | 2012-05-04 > 19:02:09 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 27 | 2012-05-10 > 02:04:36 > > | Prepared | 1 | 116 | > > | XWiki.XWikiTranslator | 8 | 28 | 2012-05-30 > 02:02:08 > > | Prepared | 0 | 5671 | > > | XWiki.XWikiTranslator | 8 | 29 | 2012-06-07 > 10:44:28 > > | Prepared | 0 | 5782 | > > > +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ > > > > > http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_core-menu-watchlist-add-page_nl?viewer=history&showminor=true > > > > Most entries are "no changes" and have been cause by the L10NUpdater > which > > wrongefully saved the document with no changes. I believe this must have > > been fixed (by Thomas M.?) mid 2012. > > > > Now the 2M lines impact performance significantly and loads the DB for > > nothing (and in the activity stream as well). > > > > I suggest we clean up the history and activity stream. We have 2 > > possibilities: > > > > For xwikircs: > > > > 1/ Clean up only the bad data from XWikiTranslator when there are no > > changes: > > > > This is complicated as you need to verify if the change is actually a > > change and you cannot do that just with sql queries. It could be very > long > > > > 2/ Clean up old data from pre-201206 from XWikiTranslator > > > > Simpler if it is safe to delete by date in the DB. After discussion with > > sergui this is a bit complicated because you need to make sure you don't > > delete the latest full version before the versions you keep. So you would > > have to do it by API which will take ages. > > > > 3/ Clean up up old data from pre-201206 from all users > > > > This is simpler as you can safely delete from the database everything > older > > than a certain versions. Cleans-up even more but would loose contributor > > statistics unless we store 2012 contributor counts in an alternate table > > which would then be regularly updated > > +1 for 3/, without the extra stats table yet. See below. > > Technically, we could also use /2, since we'll still have history > summary (who changed when), but some actual versions won't be > retrievable. Is the actual revision important? IMHO, only if we want to > investigate some foul play. > > > In any case we should probably create this intermediary table for > > statistics as it would be much faster anyway. > > > > For activitystream: > > > > 1/ Clean up old data from XWikiTranslator 201206 or earlier > > > > 2/ Clean up old date from everybody 201206 or earlier > > +1 for 1/ > > > What value do we see in the l10n history and actvity stream and which > > solutions do the other commiters suggest ? > > > > I would say it's interesting for contributor statistics (counting number > of > > contribution by translators) but beyond that we can delete the data. > > So we would fix that by storing monthly statistics in a table and > updating > > the latest 2 month through a scheduler job. This means that we can also > > delete history over 2 month. > > We can get contributor activity from the activity stream, independently > from the RCS table. Since the activity stream doesn't have inter-version > dependencies like the RCS does, we can freely discard irrelevant rows > from it, and use the remaining valid data. The same information is > present in the activity stream and in the RCS table: who did what, when, > and where. > > I mean, this is a good fast solution for the moment. We could still make > a custom statistics data structure for translation contributors, but > that would take longer, and thus it delays the migration of the l10n > wiki to a newer version, and delays the performance improvement that > we'd gain by dropping data. > > -- > Sergiu Dumitriu > http://purl.org/net/sergiu > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs > -- Ludovic Dubost Founder and CEO Blog: http://blog.ludovic.org/ XWiki: http://www.xwiki.com Skype: ldubost GTalk: ldubost _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

