Re: [xwiki-devs] [Proposal] Reduce risk of duplicated document id (XWIKI-4396)

Sergiu Dumitriu Wed, 21 Sep 2011 17:03:13 -0700

On 09/20/2011 01:32 PM, Denis Gervalle wrote:
> Since nobody seems to have remark nor opinion, I have started to do A), but
> it was still insufficient for my requirements. So I have now done B) without
> fallback using MD5, and if no one stop me, I will probably do C) soon.


Could you easily switch to SHA-1 instead?

> Since the storage is entirely broken by this change, I need the startup to
> be stopped on migration failure and migration to be mandatory. Migration
> error was ignored until now, but this is wrong IMO. It would means that the
> code was both backward compatible with old data and failsafe to any
> migration failure. Moreover, Migration could also be disabled by config, but
> this is not really safe IMO.

I agree that migration failure should stop the wiki from starting, 
provided the failure is very clear about error, not just a dumb 
"database bad, good bye, and good luck!".

I don't fully agree that they should be mandatory, though. If the 
sysadmin is competent enough to disable migrations, since they are 
enabled by default, he should be able to decide when to run them and 
when to disable them. It should be known that during an upgrade 
migrations should be run, but it's better for performance to disable 
this extra check that could take a while on large farms.

> Currently I have added code that throws if migrations fails, preventing the
> wiki to be started. Should I improve further ? could we expect user to take
> care to migration requirement, or should we have mandatory migration and
> optional one ?

I don't think that there are any optional migrations currently... 
They're all important.

> I am also wondering if I should target to commit in release 3.2 or 3.3 since
> the name of the migrator and the db version will depends on this.

We're getting close to the 3.2 release, so the codebase should be moving 
towards a lower entropy point. If you fully trust your changes and can 
commit them during the next couple of days, you have my +0.

> Hoping to have some comments this time.
>
> On Mon, Sep 19, 2011 at 12:54, Denis Gervalle<[email protected]>  wrote:
>>
>> Hi Devs,
>> Since I fall on this well-know XWiki caveat recently, I would like to
> improve this.
>> Currently, XWikiDocument.getId() is almost equivalent to
> String.hashcode(fullName:lang). Since this is a really poor hashing method
> for small changes, the risk that two documents with similar names of same
> length are colliding is really high. This Id is used by the Hibernate store
> for document unicity and really needs to be unique, and at most a
> 64bit-numeric at the same time. Currently we use only 32bits since java
> hashcode are limited to 32bits integers.
>> The ideal would be not to have this link between ids and documents
> fullName:lang, but converting the current implementation is not really easy.
> This is probably why XWIKI-4396 has never been fixed. Therefore, my current
> goal is to reduce the likeliness of a collision by choosing a better hashing
> method and taking into account the fact that document fullname are short
> string and the number of unique ids required are very limited (since the
> unicity is only expected for a given XWiki database) compared to the 64bits
> integrer range.
>> So we need to choose a better algorithm, and here are IMHO the potential
> options:
>> A) use a simple but more efficient non-cryptographic hashing function that
> runs on 64bits, I was thinking about using the algorithm produced by
> Professor Daniel J. Bernstein (DJB) since it is well-know, wildly used, easy
> to implement algorithm with a good distribution on small strings.
>> Pro: no dependency; fast; 64bits better than hashcode
>> Cons: probably more risk of collision compare to MD5 or SHA, but far less
> than now; require db migration of all document keys
>> B) use an MD5 or even stronger SHA1 or SHA256 algorithm from JCA,
> truncating to the lower 64bits. Note that oldcore already use MDA5 for
> hashcoding a whole XWikiDocument to provide the API with a
> getVersionHashcode(), and for the validation hash used by the persistent
> login manager. The first use Object.hashcode() as a fallback, which is
> really bad and defeat the purpose. The second does not provide any fallback
> and may fail unexpectedly. For our case, if we really want a fallback, we
> needs to store the hashing algorithm used in the database at creation time,
> and anyway, fail when it was not available.
>> Pro: already used in oldcore, probably less collision; with fallback,
> really flexible since it would be possible to choose the algorithm at
> creation time and does not require full migration for existing database.
>> Cons: require at least a DB schema change to add the hashing algorithm,
> probably as a column of xwikidbversion; if this config value is broken, the
> whole DB is broken
>> C) use our own MD5 implementation when the JCA provider is missing it. I
> was thinking about integrating something like
> http://twmacinta.com/myjava/fast_md5.php (non-native version) which is LGPL.
> This will ensure availability of the hashing algorythm while having a rather
> strong one.
>> Pro: no dependency, could also provide MD5 to getVersionHashcode and the
> persistent manager

I don't think C is needed, we depend on cryptography in many other 
places, like the password hashes, not to mention the whole crypto module.

>> Cons: require db migration of all document keys
>> A) is really quick to implement, simple, and the less risky, but some may
> found it insufficient. Caleb ?
>> Obviously, B) with fallback is really nice, but I wonder if it is not
> overkill ?
>> I am worried about the B) without fallback, but maybe I want it too
> flawless
>> C) is rather solid, while staying simple, but maybe overkill too.
>> I am really undecided.
>> WDYT ?
>>


-- 
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] [Proposal] Reduce risk of duplicated document id (XWIKI-4396)

Reply via email to