On Thu, Jan 24, 2013 at 8:15 AM, Sergiu Dumitriu <[email protected]> wrote:
> On 01/24/2013 12:03 AM, Ludovic Dubost wrote:
>> Working on l10n change to custom mapping showed up an issue with the l10n
>> history.
>> We have a huge history table with 2,3M lines (as well as an activity
>> stream) and we use the history table for extracting contributor statistics.
>>
>> However looking at it more closely shows that the source of most data in
>> the history table is useless.
>> See here a grouping by dates showing that localisation of already close to
>> 2M lines in 6 month in 2012.
>>
>> |                             201201 |   318346 |
>> |                             201202 |   311403 |
>> |                             201203 |   728703 |
>> |                             201204 |   271657 |
>> |                             201205 |   296384 |
>> |                             201206 |   120463 |
>>
>> Here is an example of history for an entry
>>
>> mysql> select
>> xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff
>> as unsigned), length(xwr_patch) from xwikircs where xwr_docid=596494851  ;
>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
>> | xwr_author            | xwr_version1 | xwr_version2 | xwr_date
>>  | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) |
>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
>> | XWiki.XWikiTranslator |            1 |            1 | 2010-02-23 11:38:27
>> |             |                            1 |               163 |
>> | XWiki.XWikiTranslator |            2 |            1 | 2010-02-23 11:40:26
>> |             |                            1 |               330 |
>> | XWiki.rbuj            |            3 |            1 | 2010-03-04 00:24:24
>> |             |                            1 |               189 |
>> | XWiki.rbuj            |            4 |            1 | 2010-03-04 01:02:52
>> |             |                            1 |               223 |
>> | XWiki.rbuj            |            5 |            1 | 2010-07-30 01:12:58
>> |             |                            0 |              5026 |
>> | XWiki.XWikiTranslator |            6 |            1 | 2012-01-23 11:40:25
>> |             |                            1 |               115 |
>> | XWiki.XWikiTranslator |            7 |            1 | 2012-01-23 11:58:25
>> |             |                            1 |               115 |
>> | XWiki.XWikiTranslator |            8 |            1 | 2012-01-23 13:31:52
>> |             |                            1 |               234 |
>> | XWiki.XWikiTranslator |            8 |            2 | 2012-01-23 15:36:02
>> | Prepared    |                            1 |               115 |
>> | XWiki.XWikiTranslator |            8 |            3 | 2012-01-24 02:01:12
>> | Prepared    |                            0 |              5670 |
>> | XWiki.XWikiTranslator |            8 |            4 | 2012-01-25 02:04:00
>> | Prepared    |                            1 |               115 |
>> | XWiki.XWikiTranslator |            8 |            5 | 2012-01-26 02:02:29
>> | Prepared    |                            1 |               115 |
>> | XWiki.XWikiTranslator |            8 |            6 | 2012-01-27 02:01:18
>> | Prepared    |                            1 |               115 |
>> | XWiki.XWikiTranslator |            8 |            7 | 2012-02-04 02:01:08
>> | Prepared    |                            1 |               115 |
>> | XWiki.XWikiTranslator |            8 |            8 | 2012-02-07 02:02:03
>> | Prepared    |                            0 |              5670 |
>> | XWiki.XWikiTranslator |            8 |            9 | 2012-02-08 02:01:20
>> | Prepared    |                            1 |               115 |
>> | XWiki.XWikiTranslator |            8 |           10 | 2012-02-14 02:01:07
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           11 | 2012-02-28 02:01:32
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           12 | 2012-03-03 02:01:20
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           13 | 2012-03-14 02:01:50
>> | Prepared    |                            0 |              5671 |
>> | XWiki.XWikiTranslator |            8 |           14 | 2012-03-14 18:23:06
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           15 | 2012-03-14 18:32:38
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           16 | 2012-03-14 18:37:17
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           17 | 2012-03-24 02:01:34
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           18 | 2012-03-26 16:40:35
>> | Prepared    |                            0 |              5671 |
>> | XWiki.XWikiTranslator |            8 |           19 | 2012-03-28 02:01:23
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           20 | 2012-03-29 02:01:45
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           21 | 2012-04-11 02:01:36
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           22 | 2012-04-19 15:20:10
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           24 | 2012-04-29 02:01:45
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           25 | 2012-05-04 18:55:30
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           26 | 2012-05-04 19:02:09
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           27 | 2012-05-10 02:04:36
>> | Prepared    |                            1 |               116 |
>> | XWiki.XWikiTranslator |            8 |           28 | 2012-05-30 02:02:08
>> | Prepared    |                            0 |              5671 |
>> | XWiki.XWikiTranslator |            8 |           29 | 2012-06-07 10:44:28
>> | Prepared    |                            0 |              5782 |
>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
>>
>> http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_core-menu-watchlist-add-page_nl?viewer=history&showminor=true
>>
>> Most entries are "no changes" and have been cause by the L10NUpdater which
>> wrongefully saved the document with no changes. I believe this must have
>> been fixed (by Thomas M.?)  mid 2012.

Yep, save is supposed to be done only when necessary now.

>>
>> Now the 2M lines impact performance significantly and loads the DB for
>> nothing (and in the activity stream as well).
>>
>> I suggest we clean up the history and activity stream. We have 2
>> possibilities:
>>
>> For xwikircs:
>>
>> 1/ Clean up only the bad data from XWikiTranslator when there are no
>> changes:
>>
>> This is complicated as you need to verify if the change is actually a
>> change and you cannot do that just with sql queries. It could be very long
>>
>> 2/ Clean up old data from pre-201206 from XWikiTranslator
>>
>> Simpler if it is safe to delete by date in the DB. After discussion with
>> sergui this is a bit complicated because you need to make sure you don't
>> delete the latest full version before the versions you keep. So you would
>> have to do it by API which will take ages.
>>
>> 3/ Clean up up old data from pre-201206 from all users
>>
>> This is simpler as you can safely delete from the database everything older
>> than a certain versions. Cleans-up even more but would loose contributor
>> statistics unless we store 2012 contributor counts in an alternate table
>> which would then be regularly updated
>
> +1 for 3/, without the extra stats table yet. See below.
>
> Technically, we could also use /2, since we'll still have history
> summary (who changed when), but some actual versions won't be
> retrievable. Is the actual revision important? IMHO, only if we want to
> investigate some foul play.
>
>> In any case we should probably create this intermediary table for
>> statistics as it would be much faster anyway.
>>
>> For activitystream:
>>
>> 1/ Clean up old data from XWikiTranslator 201206 or earlier
>>
>> 2/ Clean up old date from everybody 201206 or earlier
>
> +1 for 1/

+1 for 1/ plus some very old history associated to ludovic user which
is kind of the ancestor of XWikiTranslator.

>
>> What value do we see in the l10n history and actvity stream and which
>> solutions do the other commiters suggest ?
>>
>> I would say it's interesting for contributor statistics (counting number of
>> contribution by translators) but beyond that we can delete the data.
>> So we would fix that by storing monthly statistics in a table and updating
>> the latest 2 month through a scheduler job. This means that we can also
>> delete history over 2 month.
>
> We can get contributor activity from the activity stream, independently
> from the RCS table. Since the activity stream doesn't have inter-version
> dependencies like the RCS does, we can freely discard irrelevant rows
> from it, and use the remaining valid data. The same information is
> present in the activity stream and in the RCS table: who did what, when,
> and where.
>
> I mean, this is a good fast solution for the moment. We could still make
> a custom statistics data structure for translation contributors, but
> that would take longer, and thus it delays the migration of the l10n
> wiki to a newer version, and delays the performance improvement that
> we'd gain by dropping data.
>
> --
> Sergiu Dumitriu
> http://purl.org/net/sergiu
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs



--
Thomas Mortagne
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to