Re: [xwiki-devs] [VOTE] Change document id stored in the database to reduce the likelihood of duplicate id

Denis Gervalle Mon, 09 Jan 2012 02:36:41 -0800

On Mon, Jan 9, 2012 at 11:23, Vincent Massol <[email protected]> wrote:


>
> On Jan 9, 2012, at 11:09 AM, Denis Gervalle wrote:
>
> > On Mon, Jan 9, 2012 at 10:07, Vincent Massol <[email protected]> wrote:
> >
> >> +1 with the following caveats:
> >>
> >> * We need to guarantee that a migration cannot corrupt the DB.
> >
> >
> > The evolution of the migration was the first steps in that procedure,
> since
> > accessing a DB with an inappropriate XWiki core could have corrupt it.
> >
> >
> >> For example imagine that we change a document id but this id is also
> used
> >> in some other tables and the migration stops before it's changed in the
> >> other tables. The change needs to be done in transactions for each doc
> >> being changed across all tables.
> >
> >
> > That would be nice, but MySQL does not support transaction on ISAM table.
> > I use a single transaction for the whole migration process,
>
> I think we should have one transaction per document update instead. We've
> had this problem in the past when upgrading very large systems. The
> migration was never going through in one go for some reason which I have
> forgotten so we had needed to use several tx so that the migrations could
> be restarted when it failed and so that it could complete.
>

This could be done easily if you want it so. Just note that all other
migration are single transaction based AFAICS.


>
> > so on systems
> > that support it (Oracle ?), there will be migration or not. But I could
> not
> > secure MySQL better that it is possible to.
>
> It should work fine on MySQL with InnoDB which recommend (see
> http://platform.xwiki.org/xwiki/bin/view/AdminGuide/InstallationMySQL).
>

I am myself on MyISAM since long, since there is other drawback using
InnoDB.
I do not experience much issue with corruption up to now. So you could
expect other to have similar setup.


>
> Thanks
> -Vincent
>
> >> Said differently the migrator should be allowed to be ctrl-c-ed at any
> >> time and you safely restart xwiki and the migrator will just carry on
> from
> >> where it was.
> >>
> >
> > The migrator will restart were it left-off, but the granularity is the
> > document. I proceed the updates by documents, updating all tables for
> each
> > one. If there is some issue during the migration let say on MySQL, and it
> > is restarted, it will start again skipping documents that have been
> > converted previously. So the corruption could be limited to a single
> > document.
> >
> >
> >> * OR we need to have a configuration parameter for deciding to run this
> >> migration or not so that users run it only when they decide thus
> ensuring
> >> that they've done the proper backups and saving of DBs.
> >>
> >
> > This is true using the new migration procedure, but not as flexible as
> you
> > seems to expect. Supporting two hashing algorithm is not a feature, but
> an
> > augmented risk of causing corruption for me.
> > Now, if you use a recent core, that use new id, and on the other side,
> you
> > have not activated migrations and access an old db, you will simply be
> > unable to access the database. You will receive a "db require migration"
> > exception.
> >
> > Anyway, migration are disable by default, and should be enabled by an
> > administrator in xwiki.cfg. The release notes will mention the needs to
> > proceed to it, and of course, to make a backup before. And you are always
> > supposed to have backup when you upgrade, or you are not a system admin
> ;)
> >
> >
> >> I prefer the first option but we need to guarantee it.
> >>
> >
> > We will never be able to guarantee it, but I have done my best to have it
> > the most secure.
> >
> >
> >>
> >> Thanks
> >> -Vincent
> >>
> >> On Jan 7, 2012, at 10:39 PM, Denis Gervalle wrote:
> >>
> >>> Now that the database migration mechanism has been improved, I would
> like
> >>> to go ahead with my patch to improve document ids.
> >>>
> >>> Currently, ids are simple string hashcode of a locally serialized
> >> document
> >>> reference, including the language for translated documents. The
> >> likelihood
> >>> of having duplicates with the string hashing algorithm of java is
> really
> >>> high.
> >>>
> >>> What I propose is:
> >>>
> >>> 1) use an MD5 hashing which is particularly good at distributing.
> >>> 2) truncate the hash to the first 64bits, since the XWD_ID column is a
> >>> 64bit long.
> >>> 3) use a better string representation as the source of hashing
> >>>
> >>> Based on previous discussion, point 1) and 2) has already been agreed,
> >> and
> >>> this vote is in particular about the string used for 3).
> >>> I propose it in 2 steps:
> >>>
> >>> 1) before locale are fully supported in document reference, use this
> >>> format:
> >>>
> >>>
> >>
> <lengthOfLastSpaceName>:<lastSpaceName><lengthOfDocumentName>:<documentName><lengthOfLanguage>:<language>
> >>>   where language would be an empty string for the default document, so
> >> it
> >>> would look like 7:mySpace5:myDoc0: and its french translation could be
> >>> 7:mySpace5:myDoc2:fr
> >>> 2) when locale are included in reference, we will replace the
> >>> implementation by a reference serializer that would produce the same
> kind
> >>> of representation, but that will include all spaces (not only the last
> >>> one), to be prepared for the future.
> >>>
> >>> While doing so, I also propose to fix the cache key issue by using the
> >> same
> >>> reference, but prefixed by <lengthOfWikiName>:<wikiName>, so the
> previous
> >>> examples will have the following key in the document cache:
> >>> 5:xwiki7:mySpace5:myDoc0: and 5:xwiki7:mySpace5:myDoc2:fr
> >>>
> >>> Using such a key (compared to the usual serialization) has the
> following
> >>> advantages:
> >>> - ensure uniqueness of the reference without requiring a complex
> escaping
> >>> algorithm, which is unneeded here.
> >>> - potentially reversible
> >>> - faster than the usual serialization
> >>> - support language
> >>> - independent of the current serialization that may evolved
> >> independently,
> >>> so it will be stable over time which is really important when it is
> used
> >> as
> >>> a base for the hashing algorithm used for document ids stored in the
> >>> database.
> >>>
> >>> I would like to introduce this as early as possible, which means has
> soon
> >>> has we are confident with the migration mechanism recently introduced.
> >>> Since the migration of ids will convert 32bits hashes into 64bits ones,
> >> the
> >>> risk of collision is really low, and to be careful, I have written a
> >>> migration algorithm that would support such collision (unless it cause
> a
> >>> circular reference collision, but this is really unexpected). However,
> >>> changing ids again later, if we change our mind, will be really more
> >> risky
> >>> and the migration difficult to implements, so it is really important
> that
> >>> we agree on the way we compute these ids, once for all.
> >>>
> >>> Here is my +1,
> >>>
> >>> --
> >>> Denis Gervalle
> >> _______________________________________________
> >> devs mailing list
> >> [email protected]
> >> http://lists.xwiki.org/mailman/listinfo/devs
> >>
> >
> >
> >
> > --
> > Denis Gervalle
> > SOFTEC sa - CEO
> > eGuilde sarl - CTO
> > _______________________________________________
> > devs mailing list
> > [email protected]
> > http://lists.xwiki.org/mailman/listinfo/devs
>
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
>



-- 
Denis Gervalle
SOFTEC sa - CEO
eGuilde sarl - CTO
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] [VOTE] Change document id stored in the database to reduce the likelihood of duplicate id

Reply via email to