I've updated the migration plan with in particular the clean-up queries
that would delete the old history data and the old XWikiTranslator activity
stream data.
The volumes are much more reasonable now (100K activity stream and 30k
history) so I don't think we need to delete more.

I'm finalizing the custom mapping migration locally (this will add
significant history as every document will be resaved).

Can the queries be reviewed ?

http://l10n.xwiki.org/xwiki/bin/view/L10N/L10NMigrationPlan

Ludovic


2013/1/24 Ludovic Dubost <[email protected]>

> Hi,
>
> I've published a migration plan:
>
> http://l10n.xwiki.org/xwiki/bin/view/L10N/L10NMigrationPlan
>
> Review welcome
>
> Ludovic
>
>
> 2013/1/24 Ludovic Dubost <[email protected]>
>
>> FYI, here are the running times of these queries:
>>
>> mysql> create table l10n_stats_author as select year(xwr_date)*100+
>> month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
>> XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by
>> 1,2;
>> Query OK, 763 rows affected (1 min 28.36 sec)
>> Records: 763  Duplicates: 0  Warnings: 0
>>
>> mysql>   delete from l10n_stats_author where l10n_month=201301 or
>> l10n_month=201302;
>>  Query OK, 29 rows affected (0.00 sec)
>>
>> mysql>   insert into l10n_stats_author (select year(xwr_date)*100+
>> month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
>> XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where
>> xwr_date >= '2012-12-01' group by 1,2);
>> Query OK, 35 rows affected (0.07 sec)
>> Records: 35  Duplicates: 0  Warnings: 0
>>
>> Update is very fast
>>
>> Ludovic
>>
>>
>> 2013/1/24 Ludovic Dubost <[email protected]>
>>
>>> The custom statistics data structure is not that complicated.
>>> I've already the SQL query that does the initial version (based on
>>> xwikircs for now)
>>>
>>>   create table l10n_stats_author as select year(xwr_date)*100+
>>> month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
>>> XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs group by
>>> 1,2;
>>>   delete from l10n_stats_author where l10n_month=201301 or
>>> l10n_month=201302;
>>>   insert into l10n_stats_author (select year(xwr_date)*100+
>>> month(xwr_date) as l10n_month, XWR_AUTHOR as l10n_author, count(distinct
>>> XWR_DOCID, XWR_VERSION1, XWR_VERSION2) as l10n_count from xwikircs where
>>> xwr_date >= '2012-12-01' group by 1,2);
>>>   select * from l10n_stats_author where l10n_month=201301 or
>>> l10n_month=201302;
>>>
>>>
>>> Putting this in a scheduler job should not be complicated
>>>
>>> Ludovic
>>>
>>>
>>>
>>> 2013/1/24 Sergiu Dumitriu <[email protected]>
>>>
>>>> On 01/24/2013 12:03 AM, Ludovic Dubost wrote:
>>>> > Working on l10n change to custom mapping showed up an issue with the
>>>> l10n
>>>> > history.
>>>> > We have a huge history table with 2,3M lines (as well as an activity
>>>> > stream) and we use the history table for extracting contributor
>>>> statistics.
>>>> >
>>>> > However looking at it more closely shows that the source of most data
>>>> in
>>>> > the history table is useless.
>>>> > See here a grouping by dates showing that localisation of already
>>>> close to
>>>> > 2M lines in 6 month in 2012.
>>>> >
>>>> > |                             201201 |   318346 |
>>>> > |                             201202 |   311403 |
>>>> > |                             201203 |   728703 |
>>>> > |                             201204 |   271657 |
>>>> > |                             201205 |   296384 |
>>>> > |                             201206 |   120463 |
>>>> >
>>>> > Here is an example of history for an entry
>>>> >
>>>> > mysql> select
>>>> >
>>>> xwr_author,xwr_version1,xwr_version2,xwr_date,xwr_comment,cast(xwr_isdiff
>>>> > as unsigned), length(xwr_patch) from xwikircs where
>>>> xwr_docid=596494851  ;
>>>> >
>>>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
>>>> > | xwr_author            | xwr_version1 | xwr_version2 | xwr_date
>>>> >  | xwr_comment | cast(xwr_isdiff as unsigned) | length(xwr_patch) |
>>>> >
>>>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
>>>> > | XWiki.XWikiTranslator |            1 |            1 | 2010-02-23
>>>> 11:38:27
>>>> > |             |                            1 |               163 |
>>>> > | XWiki.XWikiTranslator |            2 |            1 | 2010-02-23
>>>> 11:40:26
>>>> > |             |                            1 |               330 |
>>>> > | XWiki.rbuj            |            3 |            1 | 2010-03-04
>>>> 00:24:24
>>>> > |             |                            1 |               189 |
>>>> > | XWiki.rbuj            |            4 |            1 | 2010-03-04
>>>> 01:02:52
>>>> > |             |                            1 |               223 |
>>>> > | XWiki.rbuj            |            5 |            1 | 2010-07-30
>>>> 01:12:58
>>>> > |             |                            0 |              5026 |
>>>> > | XWiki.XWikiTranslator |            6 |            1 | 2012-01-23
>>>> 11:40:25
>>>> > |             |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            7 |            1 | 2012-01-23
>>>> 11:58:25
>>>> > |             |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            8 |            1 | 2012-01-23
>>>> 13:31:52
>>>> > |             |                            1 |               234 |
>>>> > | XWiki.XWikiTranslator |            8 |            2 | 2012-01-23
>>>> 15:36:02
>>>> > | Prepared    |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            8 |            3 | 2012-01-24
>>>> 02:01:12
>>>> > | Prepared    |                            0 |              5670 |
>>>> > | XWiki.XWikiTranslator |            8 |            4 | 2012-01-25
>>>> 02:04:00
>>>> > | Prepared    |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            8 |            5 | 2012-01-26
>>>> 02:02:29
>>>> > | Prepared    |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            8 |            6 | 2012-01-27
>>>> 02:01:18
>>>> > | Prepared    |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            8 |            7 | 2012-02-04
>>>> 02:01:08
>>>> > | Prepared    |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            8 |            8 | 2012-02-07
>>>> 02:02:03
>>>> > | Prepared    |                            0 |              5670 |
>>>> > | XWiki.XWikiTranslator |            8 |            9 | 2012-02-08
>>>> 02:01:20
>>>> > | Prepared    |                            1 |               115 |
>>>> > | XWiki.XWikiTranslator |            8 |           10 | 2012-02-14
>>>> 02:01:07
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           11 | 2012-02-28
>>>> 02:01:32
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           12 | 2012-03-03
>>>> 02:01:20
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           13 | 2012-03-14
>>>> 02:01:50
>>>> > | Prepared    |                            0 |              5671 |
>>>> > | XWiki.XWikiTranslator |            8 |           14 | 2012-03-14
>>>> 18:23:06
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           15 | 2012-03-14
>>>> 18:32:38
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           16 | 2012-03-14
>>>> 18:37:17
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           17 | 2012-03-24
>>>> 02:01:34
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           18 | 2012-03-26
>>>> 16:40:35
>>>> > | Prepared    |                            0 |              5671 |
>>>> > | XWiki.XWikiTranslator |            8 |           19 | 2012-03-28
>>>> 02:01:23
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           20 | 2012-03-29
>>>> 02:01:45
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           21 | 2012-04-11
>>>> 02:01:36
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           22 | 2012-04-19
>>>> 15:20:10
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           24 | 2012-04-29
>>>> 02:01:45
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           25 | 2012-05-04
>>>> 18:55:30
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           26 | 2012-05-04
>>>> 19:02:09
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           27 | 2012-05-10
>>>> 02:04:36
>>>> > | Prepared    |                            1 |               116 |
>>>> > | XWiki.XWikiTranslator |            8 |           28 | 2012-05-30
>>>> 02:02:08
>>>> > | Prepared    |                            0 |              5671 |
>>>> > | XWiki.XWikiTranslator |            8 |           29 | 2012-06-07
>>>> 10:44:28
>>>> > | Prepared    |                            0 |              5782 |
>>>> >
>>>> +-----------------------+--------------+--------------+---------------------+-------------+------------------------------+-------------------+
>>>> >
>>>> >
>>>> http://l10n.xwiki.org/xwiki/bin/view/XE/XEXWikiCoreResources_2107067965_core-menu-watchlist-add-page_nl?viewer=history&showminor=true
>>>> >
>>>> > Most entries are "no changes" and have been cause by the L10NUpdater
>>>> which
>>>> > wrongefully saved the document with no changes. I believe this must
>>>> have
>>>> > been fixed (by Thomas M.?)  mid 2012.
>>>> >
>>>> > Now the 2M lines impact performance significantly and loads the DB for
>>>> > nothing (and in the activity stream as well).
>>>> >
>>>> > I suggest we clean up the history and activity stream. We have 2
>>>> > possibilities:
>>>> >
>>>> > For xwikircs:
>>>> >
>>>> > 1/ Clean up only the bad data from XWikiTranslator when there are no
>>>> > changes:
>>>> >
>>>> > This is complicated as you need to verify if the change is actually a
>>>> > change and you cannot do that just with sql queries. It could be very
>>>> long
>>>> >
>>>> > 2/ Clean up old data from pre-201206 from XWikiTranslator
>>>> >
>>>> > Simpler if it is safe to delete by date in the DB. After discussion
>>>> with
>>>> > sergui this is a bit complicated because you need to make sure you
>>>> don't
>>>> > delete the latest full version before the versions you keep. So you
>>>> would
>>>> > have to do it by API which will take ages.
>>>> >
>>>> > 3/ Clean up up old data from pre-201206 from all users
>>>> >
>>>> > This is simpler as you can safely delete from the database everything
>>>> older
>>>> > than a certain versions. Cleans-up even more but would loose
>>>> contributor
>>>> > statistics unless we store 2012 contributor counts in an alternate
>>>> table
>>>> > which would then be regularly updated
>>>>
>>>> +1 for 3/, without the extra stats table yet. See below.
>>>>
>>>> Technically, we could also use /2, since we'll still have history
>>>> summary (who changed when), but some actual versions won't be
>>>> retrievable. Is the actual revision important? IMHO, only if we want to
>>>> investigate some foul play.
>>>>
>>>> > In any case we should probably create this intermediary table for
>>>> > statistics as it would be much faster anyway.
>>>> >
>>>> > For activitystream:
>>>> >
>>>> > 1/ Clean up old data from XWikiTranslator 201206 or earlier
>>>> >
>>>> > 2/ Clean up old date from everybody 201206 or earlier
>>>>
>>>> +1 for 1/
>>>>
>>>> > What value do we see in the l10n history and actvity stream and which
>>>> > solutions do the other commiters suggest ?
>>>> >
>>>> > I would say it's interesting for contributor statistics (counting
>>>> number of
>>>> > contribution by translators) but beyond that we can delete the data.
>>>> > So we would fix that by storing monthly statistics in a table and
>>>> updating
>>>> > the latest 2 month through a scheduler job. This means that we can
>>>> also
>>>> > delete history over 2 month.
>>>>
>>>> We can get contributor activity from the activity stream, independently
>>>> from the RCS table. Since the activity stream doesn't have inter-version
>>>> dependencies like the RCS does, we can freely discard irrelevant rows
>>>> from it, and use the remaining valid data. The same information is
>>>> present in the activity stream and in the RCS table: who did what, when,
>>>> and where.
>>>>
>>>> I mean, this is a good fast solution for the moment. We could still make
>>>> a custom statistics data structure for translation contributors, but
>>>> that would take longer, and thus it delays the migration of the l10n
>>>> wiki to a newer version, and delays the performance improvement that
>>>> we'd gain by dropping data.
>>>>
>>>> --
>>>> Sergiu Dumitriu
>>>> http://purl.org/net/sergiu
>>>> _______________________________________________
>>>> devs mailing list
>>>> [email protected]
>>>> http://lists.xwiki.org/mailman/listinfo/devs
>>>>
>>>
>>>
>>>
>>> --
>>> Ludovic Dubost
>>> Founder and CEO
>>> Blog: http://blog.ludovic.org/
>>> XWiki: http://www.xwiki.com
>>> Skype: ldubost GTalk: ldubost
>>>
>>
>>
>>
>> --
>> Ludovic Dubost
>> Founder and CEO
>> Blog: http://blog.ludovic.org/
>> XWiki: http://www.xwiki.com
>> Skype: ldubost GTalk: ldubost
>>
>
>
>
> --
> Ludovic Dubost
> Founder and CEO
> Blog: http://blog.ludovic.org/
> XWiki: http://www.xwiki.com
> Skype: ldubost GTalk: ldubost
>



-- 
Ludovic Dubost
Founder and CEO
Blog: http://blog.ludovic.org/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to