On Thu, Jan 24, 2013 at 8:15 AM, Sergiu Dumitriu <[email protected]> wrote: > On 01/24/2013 12:03 AM, Ludovic Dubost wrote: >> Working on l10n change to custom mapping showed up an issue with the l10n >> history. >> We have a huge history table with 2,3M lines (as well as an activity >> stream) and we use the history table for extracting contributor statistics. >> >> However looking at it more closely shows that the source of most data in >> the history table is useless. >> See here a grouping by dates showing that localisation of already close to >> 2M lines in 6 month in 2012. >> >> | 201201 | 318346 | >> | 201202 | 311403 | >> | 201203 | 728703 | >> | 201204 | 271657 | >> | 201205 | 296384 | >> | 201206 | 120463 | >> >> Here is an example of history for an entry >> >> mysql> select >> xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff >> as unsigned), length(xwr_patch) from xwikircs where xwr_docid=596494851 ; >> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >> | xwr_author | xwr_version1 | xwr_version2 | xwr_date >> | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) | >> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >> | XWiki.XWikiTranslator | 1 | 1 | 2010-02-23 11:38:27 >> | | 1 | 163 | >> | XWiki.XWikiTranslator | 2 | 1 | 2010-02-23 11:40:26 >> | | 1 | 330 | >> | XWiki.rbuj | 3 | 1 | 2010-03-04 00:24:24 >> | | 1 | 189 | >> | XWiki.rbuj | 4 | 1 | 2010-03-04 01:02:52 >> | | 1 | 223 | >> | XWiki.rbuj | 5 | 1 | 2010-07-30 01:12:58 >> | | 0 | 5026 | >> | XWiki.XWikiTranslator | 6 | 1 | 2012-01-23 11:40:25 >> | | 1 | 115 | >> | XWiki.XWikiTranslator | 7 | 1 | 2012-01-23 11:58:25 >> | | 1 | 115 | >> | XWiki.XWikiTranslator | 8 | 1 | 2012-01-23 13:31:52 >> | | 1 | 234 | >> | XWiki.XWikiTranslator | 8 | 2 | 2012-01-23 15:36:02 >> | Prepared | 1 | 115 | >> | XWiki.XWikiTranslator | 8 | 3 | 2012-01-24 02:01:12 >> | Prepared | 0 | 5670 | >> | XWiki.XWikiTranslator | 8 | 4 | 2012-01-25 02:04:00 >> | Prepared | 1 | 115 | >> | XWiki.XWikiTranslator | 8 | 5 | 2012-01-26 02:02:29 >> | Prepared | 1 | 115 | >> | XWiki.XWikiTranslator | 8 | 6 | 2012-01-27 02:01:18 >> | Prepared | 1 | 115 | >> | XWiki.XWikiTranslator | 8 | 7 | 2012-02-04 02:01:08 >> | Prepared | 1 | 115 | >> | XWiki.XWikiTranslator | 8 | 8 | 2012-02-07 02:02:03 >> | Prepared | 0 | 5670 | >> | XWiki.XWikiTranslator | 8 | 9 | 2012-02-08 02:01:20 >> | Prepared | 1 | 115 | >> | XWiki.XWikiTranslator | 8 | 10 | 2012-02-14 02:01:07 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 11 | 2012-02-28 02:01:32 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 12 | 2012-03-03 02:01:20 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 13 | 2012-03-14 02:01:50 >> | Prepared | 0 | 5671 | >> | XWiki.XWikiTranslator | 8 | 14 | 2012-03-14 18:23:06 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 15 | 2012-03-14 18:32:38 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 16 | 2012-03-14 18:37:17 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 17 | 2012-03-24 02:01:34 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 18 | 2012-03-26 16:40:35 >> | Prepared | 0 | 5671 | >> | XWiki.XWikiTranslator | 8 | 19 | 2012-03-28 02:01:23 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 20 | 2012-03-29 02:01:45 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 21 | 2012-04-11 02:01:36 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 22 | 2012-04-19 15:20:10 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 24 | 2012-04-29 02:01:45 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 25 | 2012-05-04 18:55:30 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 26 | 2012-05-04 19:02:09 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 27 | 2012-05-10 02:04:36 >> | Prepared | 1 | 116 | >> | XWiki.XWikiTranslator | 8 | 28 | 2012-05-30 02:02:08 >> | Prepared | 0 | 5671 | >> | XWiki.XWikiTranslator | 8 | 29 | 2012-06-07 10:44:28 >> | Prepared | 0 | 5782 | >> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+ >> >> http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_core-menu-watchlist-add-page_nl?viewer=history&showminor=true >> >> Most entries are "no changes" and have been cause by the L10NUpdater which >> wrongefully saved the document with no changes. I believe this must have >> been fixed (by Thomas M.?) mid 2012.
Yep, save is supposed to be done only when necessary now. >> >> Now the 2M lines impact performance significantly and loads the DB for >> nothing (and in the activity stream as well). >> >> I suggest we clean up the history and activity stream. We have 2 >> possibilities: >> >> For xwikircs: >> >> 1/ Clean up only the bad data from XWikiTranslator when there are no >> changes: >> >> This is complicated as you need to verify if the change is actually a >> change and you cannot do that just with sql queries. It could be very long >> >> 2/ Clean up old data from pre-201206 from XWikiTranslator >> >> Simpler if it is safe to delete by date in the DB. After discussion with >> sergui this is a bit complicated because you need to make sure you don't >> delete the latest full version before the versions you keep. So you would >> have to do it by API which will take ages. >> >> 3/ Clean up up old data from pre-201206 from all users >> >> This is simpler as you can safely delete from the database everything older >> than a certain versions. Cleans-up even more but would loose contributor >> statistics unless we store 2012 contributor counts in an alternate table >> which would then be regularly updated > > +1 for 3/, without the extra stats table yet. See below. > > Technically, we could also use /2, since we'll still have history > summary (who changed when), but some actual versions won't be > retrievable. Is the actual revision important? IMHO, only if we want to > investigate some foul play. > >> In any case we should probably create this intermediary table for >> statistics as it would be much faster anyway. >> >> For activitystream: >> >> 1/ Clean up old data from XWikiTranslator 201206 or earlier >> >> 2/ Clean up old date from everybody 201206 or earlier > > +1 for 1/ +1 for 1/ plus some very old history associated to ludovic user which is kind of the ancestor of XWikiTranslator. > >> What value do we see in the l10n history and actvity stream and which >> solutions do the other commiters suggest ? >> >> I would say it's interesting for contributor statistics (counting number of >> contribution by translators) but beyond that we can delete the data. >> So we would fix that by storing monthly statistics in a table and updating >> the latest 2 month through a scheduler job. This means that we can also >> delete history over 2 month. > > We can get contributor activity from the activity stream, independently > from the RCS table. Since the activity stream doesn't have inter-version > dependencies like the RCS does, we can freely discard irrelevant rows > from it, and use the remaining valid data. The same information is > present in the activity stream and in the RCS table: who did what, when, > and where. > > I mean, this is a good fast solution for the moment. We could still make > a custom statistics data structure for translation contributors, but > that would take longer, and thus it delays the migration of the l10n > wiki to a newer version, and delays the performance improvement that > we'd gain by dropping data. > > -- > Sergiu Dumitriu > http://purl.org/net/sergiu > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs -- Thomas Mortagne _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

