On Thu, Sep 22, 2011 at 09:19, Denis Gervalle <[email protected]> wrote:

> Hi Sergiu,
>
> Great thanks for your feedback, I am no more alone in the dark ;)
>
> On Thu, Sep 22, 2011 at 02:02, Sergiu Dumitriu <[email protected]> wrote:
>
>  On 09/20/2011 01:32 PM, Denis Gervalle wrote:
>> > Since nobody seems to have remark nor opinion, I have started to do A),
>> but
>> > it was still insufficient for my requirements. So I have now done B)
>> without
>> > fallback using MD5, and if no one stop me, I will probably do C) soon.
>>
>> Could you easily switch to SHA-1 instead?
>>
>
> I can, but I do not feel this could be a huge improvement since we keep
> only the lower 64bits.
> Since oldcore already use MD5 for other stuffs (login persistence and
> hashcoding of document content for comparison), I would prefer not to
> introduce new dependencies on a different algorithm without a real need.
>
>
>>
>> > Since the storage is entirely broken by this change, I need the startup
>> to
>> > be stopped on migration failure and migration to be mandatory. Migration
>> > error was ignored until now, but this is wrong IMO. It would means that
>> the
>> > code was both backward compatible with old data and failsafe to any
>> > migration failure. Moreover, Migration could also be disabled by config,
>> but
>> > this is not really safe IMO.
>>
>> I agree that migration failure should stop the wiki from starting,
>> provided the failure is very clear about error, not just a dumb
>> "database bad, good bye, and good luck!".
>>
>
> The failure is in two steps:
>  1) a "migration failure on database x" with the exception returned by the
> migrator attached
>  2) a "could not continue due to x migration failures"
>
>
>>
>> I don't fully agree that they should be mandatory, though. If the
>> sysadmin is competent enough to disable migrations, since they are
>> enabled by default, he should be able to decide when to run them and
>> when to disable them. It should be known that during an upgrade
>> migrations should be run, but it's better for performance to disable
>> this extra check that could take a while on large farms.
>>
>
> Probably that I should have been clearer, but it means for me a simple step
> which ensure that the database is in a version higher enough for being
> supported by the running core. This means that even if migration are
> disabled (meaning I do not want my database to be migrated without my
> explicit authorization), we still need to check migration, and stop if
> anyone have not been applied. I agree that this could have a performance
> issue at startup of large farms, maybe we could delay the check until a real
> wiki is set to a given database (I have not check if this is complex or
> not). What afraid me is the following:
>

(I have checked now, and I think that this could be done quite easily in
XWiki#updateDatabase)


>
>  1) admin has disabled migration
>  2) admin upgrade XWiki, and reuse its old config
>  3) admin start the new version, which use changed id, and the wiki start
> creating many duplicate document like in an empty DB.
>  4) the only option of admin, if he understand what happened is to restore
> all wiki that have been accessed during 3)
>
>
>> > Currently I have added code that throws if migrations fails, preventing
>> the
>> > wiki to be started. Should I improve further ? could we expect user to
>> take
>> > care to migration requirement, or should we have mandatory migration and
>> > optional one ?
>>
>> I don't think that there are any optional migrations currently...
>> They're all important.
>>
>
> That was my feeling, simply some has less consequences than others when not
> applied.
>
>
>>
>> > I am also wondering if I should target to commit in release 3.2 or 3.3
>> since
>> > the name of the migrator and the db version will depends on this.
>>
>> We're getting close to the 3.2 release, so the codebase should be moving
>> towards a lower entropy point. If you fully trust your changes and can
>> commit them during the next couple of days, you have my +0.
>>
>
> The time to commit depends mainly on this discussion.
> I trust my code but a review could be welcome.
>
>
>> > Hoping to have some comments this time.
>> >
>> > On Mon, Sep 19, 2011 at 12:54, Denis Gervalle<[email protected]>  wrote:
>> >>
>> >> Hi Devs,
>> >> Since I fall on this well-know XWiki caveat recently, I would like to
>> > improve this.
>> >> Currently, XWikiDocument.getId() is almost equivalent to
>> > String.hashcode(fullName:lang). Since this is a really poor hashing
>> method
>> > for small changes, the risk that two documents with similar names of
>> same
>> > length are colliding is really high. This Id is used by the Hibernate
>> store
>> > for document unicity and really needs to be unique, and at most a
>> > 64bit-numeric at the same time. Currently we use only 32bits since java
>> > hashcode are limited to 32bits integers.
>> >> The ideal would be not to have this link between ids and documents
>> > fullName:lang, but converting the current implementation is not really
>> easy.
>> > This is probably why XWIKI-4396 has never been fixed. Therefore, my
>> current
>> > goal is to reduce the likeliness of a collision by choosing a better
>> hashing
>> > method and taking into account the fact that document fullname are short
>> > string and the number of unique ids required are very limited (since the
>> > unicity is only expected for a given XWiki database) compared to the
>> 64bits
>> > integrer range.
>> >> So we need to choose a better algorithm, and here are IMHO the
>> potential
>> > options:
>> >> A) use a simple but more efficient non-cryptographic hashing function
>> that
>> > runs on 64bits, I was thinking about using the algorithm produced by
>> > Professor Daniel J. Bernstein (DJB) since it is well-know, wildly used,
>> easy
>> > to implement algorithm with a good distribution on small strings.
>> >> Pro: no dependency; fast; 64bits better than hashcode
>> >> Cons: probably more risk of collision compare to MD5 or SHA, but far
>> less
>> > than now; require db migration of all document keys
>> >> B) use an MD5 or even stronger SHA1 or SHA256 algorithm from JCA,
>> > truncating to the lower 64bits. Note that oldcore already use MDA5 for
>> > hashcoding a whole XWikiDocument to provide the API with a
>> > getVersionHashcode(), and for the validation hash used by the persistent
>> > login manager. The first use Object.hashcode() as a fallback, which is
>> > really bad and defeat the purpose. The second does not provide any
>> fallback
>> > and may fail unexpectedly. For our case, if we really want a fallback,
>> we
>> > needs to store the hashing algorithm used in the database at creation
>> time,
>> > and anyway, fail when it was not available.
>> >> Pro: already used in oldcore, probably less collision; with fallback,
>> > really flexible since it would be possible to choose the algorithm at
>> > creation time and does not require full migration for existing database.
>> >> Cons: require at least a DB schema change to add the hashing algorithm,
>> > probably as a column of xwikidbversion; if this config value is broken,
>> the
>> > whole DB is broken
>> >> C) use our own MD5 implementation when the JCA provider is missing it.
>> I
>> > was thinking about integrating something like
>> > http://twmacinta.com/myjava/fast_md5.php (non-native version) which is
>> LGPL.
>> > This will ensure availability of the hashing algorythm while having a
>> rather
>> > strong one.
>> >> Pro: no dependency, could also provide MD5 to getVersionHashcode and
>> the
>> > persistent manager
>>
>> I don't think C is needed, we depend on cryptography in many other
>> places, like the password hashes, not to mention the whole crypto module.
>>
>> >> Cons: require db migration of all document keys
>> >> A) is really quick to implement, simple, and the less risky, but some
>> may
>> > found it insufficient. Caleb ?
>> >> Obviously, B) with fallback is really nice, but I wonder if it is not
>> > overkill ?
>> >> I am worried about the B) without fallback, but maybe I want it too
>> > flawless
>> >> C) is rather solid, while staying simple, but maybe overkill too.
>> >> I am really undecided.
>> >> WDYT ?
>> >>
>>
>>
>> --
>> Sergiu Dumitriu
>> http://purl.org/net/sergiu/
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>>
>
>
>
> --
>
> Denis Gervalle
> SOFTEC sa - CEO
> eGuilde sarl - CTO
>



-- 
Denis Gervalle
SOFTEC sa - CEO
eGuilde sarl - CTO
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to