FYI, here are the running times of these queries: mysql> create table l10n_stats_author as select year(xwr_date)*100+ month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by 1,2; Query OK, 763 rows affected (1 min 28.36 sec) Records: 763 Duplicates: 0 Warnings: 0
mysql> delete from l10n_stats_author where l10n_month=201301 or l10n_month=201302; Query OK, 29 rows affected (0.00 sec) mysql> insert into l10n_stats_author (select year(xwr_date)*100+ month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where xwr_date >= '2012-12-01' group by 1,2); Query OK, 35 rows affected (0.07 sec) Records: 35 Duplicates: 0 Warnings: 0 Update is very fast Ludovic 2013/1/24 Ludovic Dubost <[email protected]> > The custom statistics data structure is not that complicated. > I've already the SQL query that does the initial version (based on > xwikircs for now) > > create table l10n_stats_author as select year(xwr_date)*100+ > month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct > XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by > 1,2; > delete from l10n_stats_author where l10n_month=201301 or > l10n_month=201302; > insert into l10n_stats_author (select year(xwr_date)*100+ > month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct > XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where > xwr_date >= '2012-12-01' group by 1,2); > select * from l10n_stats_author where l10n_month=201301 or > l10n_month=201302; > > > Putting this in a scheduler job should not be complicated > > Ludovic > > > > 2013/1/24 Sergiu Dumitriu <[email protected]> > >> On 01/24/2013 12:03 AM, Ludovic Dubost wrote: >> > Working on l10n change to custom mapping showed up an issue with the >> l10n >> > history. >> > We have a huge history table with 2,3M lines (as well as an activity >> > stream) and we use the history table for extracting contributor >> statistics. >> > >> > However looking at it more closely shows that the source of most data in >> > the history table is useless. >> > See here a grouping by dates showing that localisation of already close >> to >> > 2M lines in 6 month in 2012. >> > >> > | 201201 | 318346 | >> > | 201202 | 311403 | >> > | 201203 | 728703 | >> > | 201204 | 271657 | >> > | 201205 | 296384 | >> > | 201206 | 120463 | >> > >> > Here is an example of history for an entry >> > >> > mysql> select >> > >> xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff >> > as unsigned), length(xwr_patch) from xwikircs where xwr_docid=596494851 >> ; >> > >> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >> > | xwr_author | xwr_version1 | xwr_version2 | xwr_date >> > | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) | >> > >> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >> > | XWiki.XWikiTranslator | 1 | 1 | 2010-02-23 >> 11:38:27 >> > | | 1 | 163 | >> > | XWiki.XWikiTranslator | 2 | 1 | 2010-02-23 >> 11:40:26 >> > | | 1 | 330 | >> > | XWiki.rbuj | 3 | 1 | 2010-03-04 >> 00:24:24 >> > | | 1 | 189 | >> > | XWiki.rbuj | 4 | 1 | 2010-03-04 >> 01:02:52 >> > | | 1 | 223 | >> > | XWiki.rbuj | 5 | 1 | 2010-07-30 >> 01:12:58 >> > | | 0 | 5026 | >> > | XWiki.XWikiTranslator | 6 | 1 | 2012-01-23 >> 11:40:25 >> > | | 1 | 115 | >> > | XWiki.XWikiTranslator | 7 | 1 | 2012-01-23 >> 11:58:25 >> > | | 1 | 115 | >> > | XWiki.XWikiTranslator | 8 | 1 | 2012-01-23 >> 13:31:52 >> > | | 1 | 234 | >> > | XWiki.XWikiTranslator | 8 | 2 | 2012-01-23 >> 15:36:02 >> > | Prepared | 1 | 115 | >> > | XWiki.XWikiTranslator | 8 | 3 | 2012-01-24 >> 02:01:12 >> > | Prepared | 0 | 5670 | >> > | XWiki.XWikiTranslator | 8 | 4 | 2012-01-25 >> 02:04:00 >> > | Prepared | 1 | 115 | >> > | XWiki.XWikiTranslator | 8 | 5 | 2012-01-26 >> 02:02:29 >> > | Prepared | 1 | 115 | >> > | XWiki.XWikiTranslator | 8 | 6 | 2012-01-27 >> 02:01:18 >> > | Prepared | 1 | 115 | >> > | XWiki.XWikiTranslator | 8 | 7 | 2012-02-04 >> 02:01:08 >> > | Prepared | 1 | 115 | >> > | XWiki.XWikiTranslator | 8 | 8 | 2012-02-07 >> 02:02:03 >> > | Prepared | 0 | 5670 | >> > | XWiki.XWikiTranslator | 8 | 9 | 2012-02-08 >> 02:01:20 >> > | Prepared | 1 | 115 | >> > | XWiki.XWikiTranslator | 8 | 10 | 2012-02-14 >> 02:01:07 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 11 | 2012-02-28 >> 02:01:32 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 12 | 2012-03-03 >> 02:01:20 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 13 | 2012-03-14 >> 02:01:50 >> > | Prepared | 0 | 5671 | >> > | XWiki.XWikiTranslator | 8 | 14 | 2012-03-14 >> 18:23:06 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 15 | 2012-03-14 >> 18:32:38 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 16 | 2012-03-14 >> 18:37:17 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 17 | 2012-03-24 >> 02:01:34 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 18 | 2012-03-26 >> 16:40:35 >> > | Prepared | 0 | 5671 | >> > | XWiki.XWikiTranslator | 8 | 19 | 2012-03-28 >> 02:01:23 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 20 | 2012-03-29 >> 02:01:45 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 21 | 2012-04-11 >> 02:01:36 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 22 | 2012-04-19 >> 15:20:10 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 24 | 2012-04-29 >> 02:01:45 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 25 | 2012-05-04 >> 18:55:30 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 26 | 2012-05-04 >> 19:02:09 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 27 | 2012-05-10 >> 02:04:36 >> > | Prepared | 1 | 116 | >> > | XWiki.XWikiTranslator | 8 | 28 | 2012-05-30 >> 02:02:08 >> > | Prepared | 0 | 5671 | >> > | XWiki.XWikiTranslator | 8 | 29 | 2012-06-07 >> 10:44:28 >> > | Prepared | 0 | 5782 | >> > >> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >> > >> > >> http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_core-menu-watchlist-add-page_nl?viewer=history&showminor=true >> > >> > Most entries are "no changes" and have been cause by the L10NUpdater >> which >> > wrongefully saved the document with no changes. I believe this must have >> > been fixed (by Thomas M.?) mid 2012. >> > >> > Now the 2M lines impact performance significantly and loads the DB for >> > nothing (and in the activity stream as well). >> > >> > I suggest we clean up the history and activity stream. We have 2 >> > possibilities: >> > >> > For xwikircs: >> > >> > 1/ Clean up only the bad data from XWikiTranslator when there are no >> > changes: >> > >> > This is complicated as you need to verify if the change is actually a >> > change and you cannot do that just with sql queries. It could be very >> long >> > >> > 2/ Clean up old data from pre-201206 from XWikiTranslator >> > >> > Simpler if it is safe to delete by date in the DB. After discussion with >> > sergui this is a bit complicated because you need to make sure you don't >> > delete the latest full version before the versions you keep. So you >> would >> > have to do it by API which will take ages. >> > >> > 3/ Clean up up old data from pre-201206 from all users >> > >> > This is simpler as you can safely delete from the database everything >> older >> > than a certain versions. Cleans-up even more but would loose contributor >> > statistics unless we store 2012 contributor counts in an alternate table >> > which would then be regularly updated >> >> +1 for 3/, without the extra stats table yet. See below. >> >> Technically, we could also use /2, since we'll still have history >> summary (who changed when), but some actual versions won't be >> retrievable. Is the actual revision important? IMHO, only if we want to >> investigate some foul play. >> >> > In any case we should probably create this intermediary table for >> > statistics as it would be much faster anyway. >> > >> > For activitystream: >> > >> > 1/ Clean up old data from XWikiTranslator 201206 or earlier >> > >> > 2/ Clean up old date from everybody 201206 or earlier >> >> +1 for 1/ >> >> > What value do we see in the l10n history and actvity stream and which >> > solutions do the other commiters suggest ? >> > >> > I would say it's interesting for contributor statistics (counting >> number of >> > contribution by translators) but beyond that we can delete the data. >> > So we would fix that by storing monthly statistics in a table and >> updating >> > the latest 2 month through a scheduler job. This means that we can also >> > delete history over 2 month. >> >> We can get contributor activity from the activity stream, independently >> from the RCS table. Since the activity stream doesn't have inter-version >> dependencies like the RCS does, we can freely discard irrelevant rows >> from it, and use the remaining valid data. The same information is >> present in the activity stream and in the RCS table: who did what, when, >> and where. >> >> I mean, this is a good fast solution for the moment. We could still make >> a custom statistics data structure for translation contributors, but >> that would take longer, and thus it delays the migration of the l10n >> wiki to a newer version, and delays the performance improvement that >> we'd gain by dropping data. >> >> -- >> Sergiu Dumitriu >> http://purl.org/net/sergiu >> _______________________________________________ >> devs mailing list >> [email protected] >> http://lists.xwiki.org/mailman/listinfo/devs >> > > > > -- > Ludovic Dubost > Founder and CEO > Blog: http://blog.ludovic.org/ > XWiki: http://www.xwiki.com > Skype: ldubost GTalk: ldubost > -- Ludovic Dubost Founder and CEO Blog: http://blog.ludovic.org/ XWiki: http://www.xwiki.com Skype: ldubost GTalk: ldubost _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

