On Jan 9, 2012, at 11:36 AM, Denis Gervalle wrote: > On Mon, Jan 9, 2012 at 11:23, Vincent Massol <[email protected]> wrote: > >> >> On Jan 9, 2012, at 11:09 AM, Denis Gervalle wrote: >> >>> On Mon, Jan 9, 2012 at 10:07, Vincent Massol <[email protected]> wrote: >>> >>>> +1 with the following caveats: >>>> >>>> * We need to guarantee that a migration cannot corrupt the DB. >>> >>> >>> The evolution of the migration was the first steps in that procedure, >> since >>> accessing a DB with an inappropriate XWiki core could have corrupt it. >>> >>> >>>> For example imagine that we change a document id but this id is also >> used >>>> in some other tables and the migration stops before it's changed in the >>>> other tables. The change needs to be done in transactions for each doc >>>> being changed across all tables. >>> >>> >>> That would be nice, but MySQL does not support transaction on ISAM table. >>> I use a single transaction for the whole migration process, >> >> I think we should have one transaction per document update instead. We've >> had this problem in the past when upgrading very large systems. The >> migration was never going through in one go for some reason which I have >> forgotten so we had needed to use several tx so that the migrations could >> be restarted when it failed and so that it could complete. >> > > This could be done easily if you want it so. Just note that all other > migration are single transaction based AFAICS.
I'm pretty sure this isn't the case. See R4359XWIKI1459DataMigration and R6079XWIKI1878DataMigration for example. Thanks -Vincent >>> so on systems >>> that support it (Oracle ?), there will be migration or not. But I could >> not >>> secure MySQL better that it is possible to. >> >> It should work fine on MySQL with InnoDB which recommend (see >> http://platform.xwiki.org/xwiki/bin/view/AdminGuide/InstallationMySQL). >> > > I am myself on MyISAM since long, since there is other drawback using > InnoDB. > I do not experience much issue with corruption up to now. So you could > expect other to have similar setup. > > >> >> Thanks >> -Vincent >> >>>> Said differently the migrator should be allowed to be ctrl-c-ed at any >>>> time and you safely restart xwiki and the migrator will just carry on >> from >>>> where it was. >>>> >>> >>> The migrator will restart were it left-off, but the granularity is the >>> document. I proceed the updates by documents, updating all tables for >> each >>> one. If there is some issue during the migration let say on MySQL, and it >>> is restarted, it will start again skipping documents that have been >>> converted previously. So the corruption could be limited to a single >>> document. >>> >>> >>>> * OR we need to have a configuration parameter for deciding to run this >>>> migration or not so that users run it only when they decide thus >> ensuring >>>> that they've done the proper backups and saving of DBs. >>>> >>> >>> This is true using the new migration procedure, but not as flexible as >> you >>> seems to expect. Supporting two hashing algorithm is not a feature, but >> an >>> augmented risk of causing corruption for me. >>> Now, if you use a recent core, that use new id, and on the other side, >> you >>> have not activated migrations and access an old db, you will simply be >>> unable to access the database. You will receive a "db require migration" >>> exception. >>> >>> Anyway, migration are disable by default, and should be enabled by an >>> administrator in xwiki.cfg. The release notes will mention the needs to >>> proceed to it, and of course, to make a backup before. And you are always >>> supposed to have backup when you upgrade, or you are not a system admin >> ;) >>> >>> >>>> I prefer the first option but we need to guarantee it. >>>> >>> >>> We will never be able to guarantee it, but I have done my best to have it >>> the most secure. >>> >>> >>>> >>>> Thanks >>>> -Vincent >>>> >>>> On Jan 7, 2012, at 10:39 PM, Denis Gervalle wrote: >>>> >>>>> Now that the database migration mechanism has been improved, I would >> like >>>>> to go ahead with my patch to improve document ids. >>>>> >>>>> Currently, ids are simple string hashcode of a locally serialized >>>> document >>>>> reference, including the language for translated documents. The >>>> likelihood >>>>> of having duplicates with the string hashing algorithm of java is >> really >>>>> high. >>>>> >>>>> What I propose is: >>>>> >>>>> 1) use an MD5 hashing which is particularly good at distributing. >>>>> 2) truncate the hash to the first 64bits, since the XWD_ID column is a >>>>> 64bit long. >>>>> 3) use a better string representation as the source of hashing >>>>> >>>>> Based on previous discussion, point 1) and 2) has already been agreed, >>>> and >>>>> this vote is in particular about the string used for 3). >>>>> I propose it in 2 steps: >>>>> >>>>> 1) before locale are fully supported in document reference, use this >>>>> format: >>>>> >>>>> >>>> >> <lengthOfLastSpaceName>:<lastSpaceName><lengthOfDocumentName>:<documentName><lengthOfLanguage>:<language> >>>>> where language would be an empty string for the default document, so >>>> it >>>>> would look like 7:mySpace5:myDoc0: and its french translation could be >>>>> 7:mySpace5:myDoc2:fr >>>>> 2) when locale are included in reference, we will replace the >>>>> implementation by a reference serializer that would produce the same >> kind >>>>> of representation, but that will include all spaces (not only the last >>>>> one), to be prepared for the future. >>>>> >>>>> While doing so, I also propose to fix the cache key issue by using the >>>> same >>>>> reference, but prefixed by <lengthOfWikiName>:<wikiName>, so the >> previous >>>>> examples will have the following key in the document cache: >>>>> 5:xwiki7:mySpace5:myDoc0: and 5:xwiki7:mySpace5:myDoc2:fr >>>>> >>>>> Using such a key (compared to the usual serialization) has the >> following >>>>> advantages: >>>>> - ensure uniqueness of the reference without requiring a complex >> escaping >>>>> algorithm, which is unneeded here. >>>>> - potentially reversible >>>>> - faster than the usual serialization >>>>> - support language >>>>> - independent of the current serialization that may evolved >>>> independently, >>>>> so it will be stable over time which is really important when it is >> used >>>> as >>>>> a base for the hashing algorithm used for document ids stored in the >>>>> database. >>>>> >>>>> I would like to introduce this as early as possible, which means has >> soon >>>>> has we are confident with the migration mechanism recently introduced. >>>>> Since the migration of ids will convert 32bits hashes into 64bits ones, >>>> the >>>>> risk of collision is really low, and to be careful, I have written a >>>>> migration algorithm that would support such collision (unless it cause >> a >>>>> circular reference collision, but this is really unexpected). However, >>>>> changing ids again later, if we change our mind, will be really more >>>> risky >>>>> and the migration difficult to implements, so it is really important >> that >>>>> we agree on the way we compute these ids, once for all. >>>>> >>>>> Here is my +1, >>>>> >>>>> -- >>>>> Denis Gervalle _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

