On Thu, Sep 22, 2011 at 09:19, Denis Gervalle <[email protected]> wrote:
> Hi Sergiu, > > Great thanks for your feedback, I am no more alone in the dark ;) > > On Thu, Sep 22, 2011 at 02:02, Sergiu Dumitriu <[email protected]> wrote: > > On 09/20/2011 01:32 PM, Denis Gervalle wrote: >> > Since nobody seems to have remark nor opinion, I have started to do A), >> but >> > it was still insufficient for my requirements. So I have now done B) >> without >> > fallback using MD5, and if no one stop me, I will probably do C) soon. >> >> Could you easily switch to SHA-1 instead? >> > > I can, but I do not feel this could be a huge improvement since we keep > only the lower 64bits. > Since oldcore already use MD5 for other stuffs (login persistence and > hashcoding of document content for comparison), I would prefer not to > introduce new dependencies on a different algorithm without a real need. > > >> >> > Since the storage is entirely broken by this change, I need the startup >> to >> > be stopped on migration failure and migration to be mandatory. Migration >> > error was ignored until now, but this is wrong IMO. It would means that >> the >> > code was both backward compatible with old data and failsafe to any >> > migration failure. Moreover, Migration could also be disabled by config, >> but >> > this is not really safe IMO. >> >> I agree that migration failure should stop the wiki from starting, >> provided the failure is very clear about error, not just a dumb >> "database bad, good bye, and good luck!". >> > > The failure is in two steps: > 1) a "migration failure on database x" with the exception returned by the > migrator attached > 2) a "could not continue due to x migration failures" > > >> >> I don't fully agree that they should be mandatory, though. If the >> sysadmin is competent enough to disable migrations, since they are >> enabled by default, he should be able to decide when to run them and >> when to disable them. It should be known that during an upgrade >> migrations should be run, but it's better for performance to disable >> this extra check that could take a while on large farms. >> > > Probably that I should have been clearer, but it means for me a simple step > which ensure that the database is in a version higher enough for being > supported by the running core. This means that even if migration are > disabled (meaning I do not want my database to be migrated without my > explicit authorization), we still need to check migration, and stop if > anyone have not been applied. I agree that this could have a performance > issue at startup of large farms, maybe we could delay the check until a real > wiki is set to a given database (I have not check if this is complex or > not). What afraid me is the following: > (I have checked now, and I think that this could be done quite easily in XWiki#updateDatabase) > > 1) admin has disabled migration > 2) admin upgrade XWiki, and reuse its old config > 3) admin start the new version, which use changed id, and the wiki start > creating many duplicate document like in an empty DB. > 4) the only option of admin, if he understand what happened is to restore > all wiki that have been accessed during 3) > > >> > Currently I have added code that throws if migrations fails, preventing >> the >> > wiki to be started. Should I improve further ? could we expect user to >> take >> > care to migration requirement, or should we have mandatory migration and >> > optional one ? >> >> I don't think that there are any optional migrations currently... >> They're all important. >> > > That was my feeling, simply some has less consequences than others when not > applied. > > >> >> > I am also wondering if I should target to commit in release 3.2 or 3.3 >> since >> > the name of the migrator and the db version will depends on this. >> >> We're getting close to the 3.2 release, so the codebase should be moving >> towards a lower entropy point. If you fully trust your changes and can >> commit them during the next couple of days, you have my +0. >> > > The time to commit depends mainly on this discussion. > I trust my code but a review could be welcome. > > >> > Hoping to have some comments this time. >> > >> > On Mon, Sep 19, 2011 at 12:54, Denis Gervalle<[email protected]> wrote: >> >> >> >> Hi Devs, >> >> Since I fall on this well-know XWiki caveat recently, I would like to >> > improve this. >> >> Currently, XWikiDocument.getId() is almost equivalent to >> > String.hashcode(fullName:lang). Since this is a really poor hashing >> method >> > for small changes, the risk that two documents with similar names of >> same >> > length are colliding is really high. This Id is used by the Hibernate >> store >> > for document unicity and really needs to be unique, and at most a >> > 64bit-numeric at the same time. Currently we use only 32bits since java >> > hashcode are limited to 32bits integers. >> >> The ideal would be not to have this link between ids and documents >> > fullName:lang, but converting the current implementation is not really >> easy. >> > This is probably why XWIKI-4396 has never been fixed. Therefore, my >> current >> > goal is to reduce the likeliness of a collision by choosing a better >> hashing >> > method and taking into account the fact that document fullname are short >> > string and the number of unique ids required are very limited (since the >> > unicity is only expected for a given XWiki database) compared to the >> 64bits >> > integrer range. >> >> So we need to choose a better algorithm, and here are IMHO the >> potential >> > options: >> >> A) use a simple but more efficient non-cryptographic hashing function >> that >> > runs on 64bits, I was thinking about using the algorithm produced by >> > Professor Daniel J. Bernstein (DJB) since it is well-know, wildly used, >> easy >> > to implement algorithm with a good distribution on small strings. >> >> Pro: no dependency; fast; 64bits better than hashcode >> >> Cons: probably more risk of collision compare to MD5 or SHA, but far >> less >> > than now; require db migration of all document keys >> >> B) use an MD5 or even stronger SHA1 or SHA256 algorithm from JCA, >> > truncating to the lower 64bits. Note that oldcore already use MDA5 for >> > hashcoding a whole XWikiDocument to provide the API with a >> > getVersionHashcode(), and for the validation hash used by the persistent >> > login manager. The first use Object.hashcode() as a fallback, which is >> > really bad and defeat the purpose. The second does not provide any >> fallback >> > and may fail unexpectedly. For our case, if we really want a fallback, >> we >> > needs to store the hashing algorithm used in the database at creation >> time, >> > and anyway, fail when it was not available. >> >> Pro: already used in oldcore, probably less collision; with fallback, >> > really flexible since it would be possible to choose the algorithm at >> > creation time and does not require full migration for existing database. >> >> Cons: require at least a DB schema change to add the hashing algorithm, >> > probably as a column of xwikidbversion; if this config value is broken, >> the >> > whole DB is broken >> >> C) use our own MD5 implementation when the JCA provider is missing it. >> I >> > was thinking about integrating something like >> > http://twmacinta.com/myjava/fast_md5.php (non-native version) which is >> LGPL. >> > This will ensure availability of the hashing algorythm while having a >> rather >> > strong one. >> >> Pro: no dependency, could also provide MD5 to getVersionHashcode and >> the >> > persistent manager >> >> I don't think C is needed, we depend on cryptography in many other >> places, like the password hashes, not to mention the whole crypto module. >> >> >> Cons: require db migration of all document keys >> >> A) is really quick to implement, simple, and the less risky, but some >> may >> > found it insufficient. Caleb ? >> >> Obviously, B) with fallback is really nice, but I wonder if it is not >> > overkill ? >> >> I am worried about the B) without fallback, but maybe I want it too >> > flawless >> >> C) is rather solid, while staying simple, but maybe overkill too. >> >> I am really undecided. >> >> WDYT ? >> >> >> >> >> -- >> Sergiu Dumitriu >> http://purl.org/net/sergiu/ >> _______________________________________________ >> devs mailing list >> [email protected] >> http://lists.xwiki.org/mailman/listinfo/devs >> > > > > -- > > Denis Gervalle > SOFTEC sa - CEO > eGuilde sarl - CTO > -- Denis Gervalle SOFTEC sa - CEO eGuilde sarl - CTO _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

