Hi, I've published a migration plan:
http://l10n.xwiki.org/xwiki/bin/view/L10N/L10NMigrationPlan Review welcome Ludovic 2013/1/24 Ludovic Dubost <[email protected]> > FYI, here are the running times of these queries: > > mysql> create table l10n_stats_author as select year(xwr_date)*100+ > month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct > XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by > 1,2; > Query OK, 763 rows affected (1 min 28.36 sec) > Records: 763 Duplicates: 0 Warnings: 0 > > mysql> delete from l10n_stats_author where l10n_month=201301 or > l10n_month=201302; > Query OK, 29 rows affected (0.00 sec) > > mysql> insert into l10n_stats_author (select year(xwr_date)*100+ > month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct > XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where > xwr_date >= '2012-12-01' group by 1,2); > Query OK, 35 rows affected (0.07 sec) > Records: 35 Duplicates: 0 Warnings: 0 > > Update is very fast > > Ludovic > > > 2013/1/24 Ludovic Dubost <[email protected]> > >> The custom statistics data structure is not that complicated. >> I've already the SQL query that does the initial version (based on >> xwikircs for now) >> >> create table l10n_stats_author as select year(xwr_date)*100+ >> month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct >> XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by >> 1,2; >> delete from l10n_stats_author where l10n_month=201301 or >> l10n_month=201302; >> insert into l10n_stats_author (select year(xwr_date)*100+ >> month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct >> XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where >> xwr_date >= '2012-12-01' group by 1,2); >> select * from l10n_stats_author where l10n_month=201301 or >> l10n_month=201302; >> >> >> Putting this in a scheduler job should not be complicated >> >> Ludovic >> >> >> >> 2013/1/24 Sergiu Dumitriu <[email protected]> >> >>> On 01/24/2013 12:03 AM, Ludovic Dubost wrote: >>> > Working on l10n change to custom mapping showed up an issue with the >>> l10n >>> > history. >>> > We have a huge history table with 2,3M lines (as well as an activity >>> > stream) and we use the history table for extracting contributor >>> statistics. >>> > >>> > However looking at it more closely shows that the source of most data >>> in >>> > the history table is useless. >>> > See here a grouping by dates showing that localisation of already >>> close to >>> > 2M lines in 6 month in 2012. >>> > >>> > | 201201 | 318346 | >>> > | 201202 | 311403 | >>> > | 201203 | 728703 | >>> > | 201204 | 271657 | >>> > | 201205 | 296384 | >>> > | 201206 | 120463 | >>> > >>> > Here is an example of history for an entry >>> > >>> > mysql> select >>> > >>> xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff >>> > as unsigned), length(xwr_patch) from xwikircs where >>> xwr_docid=596494851 ; >>> > >>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >>> > | xwr_author | xwr_version1 | xwr_version2 | xwr_date >>> > | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) | >>> > >>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >>> > | XWiki.XWikiTranslator | 1 | 1 | 2010-02-23 >>> 11:38:27 >>> > | | 1 | 163 | >>> > | XWiki.XWikiTranslator | 2 | 1 | 2010-02-23 >>> 11:40:26 >>> > | | 1 | 330 | >>> > | XWiki.rbuj | 3 | 1 | 2010-03-04 >>> 00:24:24 >>> > | | 1 | 189 | >>> > | XWiki.rbuj | 4 | 1 | 2010-03-04 >>> 01:02:52 >>> > | | 1 | 223 | >>> > | XWiki.rbuj | 5 | 1 | 2010-07-30 >>> 01:12:58 >>> > | | 0 | 5026 | >>> > | XWiki.XWikiTranslator | 6 | 1 | 2012-01-23 >>> 11:40:25 >>> > | | 1 | 115 | >>> > | XWiki.XWikiTranslator | 7 | 1 | 2012-01-23 >>> 11:58:25 >>> > | | 1 | 115 | >>> > | XWiki.XWikiTranslator | 8 | 1 | 2012-01-23 >>> 13:31:52 >>> > | | 1 | 234 | >>> > | XWiki.XWikiTranslator | 8 | 2 | 2012-01-23 >>> 15:36:02 >>> > | Prepared | 1 | 115 | >>> > | XWiki.XWikiTranslator | 8 | 3 | 2012-01-24 >>> 02:01:12 >>> > | Prepared | 0 | 5670 | >>> > | XWiki.XWikiTranslator | 8 | 4 | 2012-01-25 >>> 02:04:00 >>> > | Prepared | 1 | 115 | >>> > | XWiki.XWikiTranslator | 8 | 5 | 2012-01-26 >>> 02:02:29 >>> > | Prepared | 1 | 115 | >>> > | XWiki.XWikiTranslator | 8 | 6 | 2012-01-27 >>> 02:01:18 >>> > | Prepared | 1 | 115 | >>> > | XWiki.XWikiTranslator | 8 | 7 | 2012-02-04 >>> 02:01:08 >>> > | Prepared | 1 | 115 | >>> > | XWiki.XWikiTranslator | 8 | 8 | 2012-02-07 >>> 02:02:03 >>> > | Prepared | 0 | 5670 | >>> > | XWiki.XWikiTranslator | 8 | 9 | 2012-02-08 >>> 02:01:20 >>> > | Prepared | 1 | 115 | >>> > | XWiki.XWikiTranslator | 8 | 10 | 2012-02-14 >>> 02:01:07 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 11 | 2012-02-28 >>> 02:01:32 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 12 | 2012-03-03 >>> 02:01:20 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 13 | 2012-03-14 >>> 02:01:50 >>> > | Prepared | 0 | 5671 | >>> > | XWiki.XWikiTranslator | 8 | 14 | 2012-03-14 >>> 18:23:06 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 15 | 2012-03-14 >>> 18:32:38 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 16 | 2012-03-14 >>> 18:37:17 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 17 | 2012-03-24 >>> 02:01:34 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 18 | 2012-03-26 >>> 16:40:35 >>> > | Prepared | 0 | 5671 | >>> > | XWiki.XWikiTranslator | 8 | 19 | 2012-03-28 >>> 02:01:23 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 20 | 2012-03-29 >>> 02:01:45 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 21 | 2012-04-11 >>> 02:01:36 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 22 | 2012-04-19 >>> 15:20:10 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 24 | 2012-04-29 >>> 02:01:45 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 25 | 2012-05-04 >>> 18:55:30 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 26 | 2012-05-04 >>> 19:02:09 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 27 | 2012-05-10 >>> 02:04:36 >>> > | Prepared | 1 | 116 | >>> > | XWiki.XWikiTranslator | 8 | 28 | 2012-05-30 >>> 02:02:08 >>> > | Prepared | 0 | 5671 | >>> > | XWiki.XWikiTranslator | 8 | 29 | 2012-06-07 >>> 10:44:28 >>> > | Prepared | 0 | 5782 | >>> > >>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >>> > >>> > >>> http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_core-menu-watchlist-add-page_nl?viewer=history&showminor=true >>> > >>> > Most entries are "no changes" and have been cause by the L10NUpdater >>> which >>> > wrongefully saved the document with no changes. I believe this must >>> have >>> > been fixed (by Thomas M.?) mid 2012. >>> > >>> > Now the 2M lines impact performance significantly and loads the DB for >>> > nothing (and in the activity stream as well). >>> > >>> > I suggest we clean up the history and activity stream. We have 2 >>> > possibilities: >>> > >>> > For xwikircs: >>> > >>> > 1/ Clean up only the bad data from XWikiTranslator when there are no >>> > changes: >>> > >>> > This is complicated as you need to verify if the change is actually a >>> > change and you cannot do that just with sql queries. It could be very >>> long >>> > >>> > 2/ Clean up old data from pre-201206 from XWikiTranslator >>> > >>> > Simpler if it is safe to delete by date in the DB. After discussion >>> with >>> > sergui this is a bit complicated because you need to make sure you >>> don't >>> > delete the latest full version before the versions you keep. So you >>> would >>> > have to do it by API which will take ages. >>> > >>> > 3/ Clean up up old data from pre-201206 from all users >>> > >>> > This is simpler as you can safely delete from the database everything >>> older >>> > than a certain versions. Cleans-up even more but would loose >>> contributor >>> > statistics unless we store 2012 contributor counts in an alternate >>> table >>> > which would then be regularly updated >>> >>> +1 for 3/, without the extra stats table yet. See below. >>> >>> Technically, we could also use /2, since we'll still have history >>> summary (who changed when), but some actual versions won't be >>> retrievable. Is the actual revision important? IMHO, only if we want to >>> investigate some foul play. >>> >>> > In any case we should probably create this intermediary table for >>> > statistics as it would be much faster anyway. >>> > >>> > For activitystream: >>> > >>> > 1/ Clean up old data from XWikiTranslator 201206 or earlier >>> > >>> > 2/ Clean up old date from everybody 201206 or earlier >>> >>> +1 for 1/ >>> >>> > What value do we see in the l10n history and actvity stream and which >>> > solutions do the other commiters suggest ? >>> > >>> > I would say it's interesting for contributor statistics (counting >>> number of >>> > contribution by translators) but beyond that we can delete the data. >>> > So we would fix that by storing monthly statistics in a table and >>> updating >>> > the latest 2 month through a scheduler job. This means that we can also >>> > delete history over 2 month. >>> >>> We can get contributor activity from the activity stream, independently >>> from the RCS table. Since the activity stream doesn't have inter-version >>> dependencies like the RCS does, we can freely discard irrelevant rows >>> from it, and use the remaining valid data. The same information is >>> present in the activity stream and in the RCS table: who did what, when, >>> and where. >>> >>> I mean, this is a good fast solution for the moment. We could still make >>> a custom statistics data structure for translation contributors, but >>> that would take longer, and thus it delays the migration of the l10n >>> wiki to a newer version, and delays the performance improvement that >>> we'd gain by dropping data. >>> >>> -- >>> Sergiu Dumitriu >>> http://purl.org/net/sergiu >>> _______________________________________________ >>> devs mailing list >>> [email protected] >>> http://lists.xwiki.org/mailman/listinfo/devs >>> >> >> >> >> -- >> Ludovic Dubost >> Founder and CEO >> Blog: http://blog.ludovic.org/ >> XWiki: http://www.xwiki.com >> Skype: ldubost GTalk: ldubost >> > > > > -- > Ludovic Dubost > Founder and CEO > Blog: http://blog.ludovic.org/ > XWiki: http://www.xwiki.com > Skype: ldubost GTalk: ldubost > -- Ludovic Dubost Founder and CEO Blog: http://blog.ludovic.org/ XWiki: http://www.xwiki.com Skype: ldubost GTalk: ldubost _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

