With "most of the code", do you also mean when new documents are being created and stored (the scenario where collisions happen)?
Imho there is nothing preventing collisions when doing: * new XWikiDocument(docRef) * modify instance * exists check and save On 2 February 2018 at 15:28, Thomas Mortagne <[email protected]> wrote: > On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[email protected]> > wrote: > > Introducing this certainly doesn't hurt, but 'm not sure how useful it > is. > > I never said it was the solution to all collisions but it will cover > most of the (very rare and never reported) overwrites at 0 cost. > > > Firstly, it can show collisions only after a document is already > > overwritten, thus the damage is already done. > > Again keep in mind that most of the code does: > * getDocument(DocumentReference) > * modify the XWikiDocument instance > * saveDocument() > > so if getDocument() fail you are not going to overwrite anything. > > > Secondly, loadXWikiDoc has to > > be called for the document which doesn't exist anymore, I guess this > > doesn't happen so often since the system won't list it anymore. > > Not sure which use case you are referring to here. Are you talking > about document deleted the a document with a different reference but > same hash is saved ? > > > > > On 2 February 2018 at 13:36, Thomas Mortagne <[email protected]> > > wrote: > > > >> For document what could help a lot already without any performance > >> penalty is to compare the loaded document reference and the passed one > >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code > >> in XWiki apply the following logic: getDocument(), modify it, > >> saveDocument(). > >> > >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <[email protected] > > > >> wrote: > >> > Hi Denis, > >> > > >> > Thanks a lot for your answer. I know it's been a while, but I'd still > >> like > >> > to follow up on it since it's quite the fundamental issue. > >> > > >> >> Therefore, by improving the hash algorithm, the size of the ids, and > the > >> > quality of the hashed key, we have considered ourselves to be saved > >> enough > >> > for a normal usage. > >> > > >> > Still, with enough bad luck, documents and objects may be overwritten > >> > without a trace. This is not a stable implementation. And even worse, > if > >> on > >> > any XWiki installation hash collisions will happen in the future (or > have > >> > already happened since 4.x), they probably won't be easily associated > >> with > >> > this issue because it's nearly impossible to debug. > >> > > >> > While I do now understand the motivation to stick with hashes, I'm > still > >> > not sure why a collision detection would be difficult to introduce and > >> why > >> > it's even "impossible for some API". Let me briefly outline an idea: > >> > > >> > In XWikiHibernateStore#saveXWikiDoc on L615 > >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/ > >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/ > >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615> > >> > an exists check on the doc id is already performed. If now > >> > xwikidoc.fullName is also selected in the HQL, a comparison to > >> > doc.getDocumentReference() can expose an imminent collision before > data > >> is > >> > overwritten. At least an XWikiException should be thrown in this > case. A > >> > similar thing could be done before saving BaseObjects on L1203 > >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/ > >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/ > >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203> > >> > to avoid collisions on Object IDs. > >> > > >> > I don't think a change like this would be difficult to implement, I > could > >> > provide a PR of that sort. The performance penalty has to be tested > for > >> > your systems though, since the full name isn't indexed afaik. > >> > > >> > Regards > >> > > >> > Marc Sladek > >> > synventis gmbh > >> > > >> > On 30 November 2017 at 15:21, Denis Gervalle <[email protected]> wrote: > >> > > >> >> Hi Marc, > >> >> > >> >> Here are some answers: > >> >> > >> >> 1) MD5 was already a dependency of our oldcore and using SHA1 would > have > >> >> added a dependency without bringing much benefit. Since we only used > 64 > >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a > better > >> >> distribution. > >> >> > >> >> 2) Such a collision detection is difficult to be introduced in the > >> >> existing code base, for some API it is even impossible. What you > >> experience > >> >> with the 32-bit ids had been my motivation to the changes in 4.x and > I > >> >> could say, based on my long XWiki experience, that even with the poor > >> >> 32 bit ids, very few users had been affected. Therefore, by improving > >> the > >> >> hash algorithm, the size of the ids, and the quality of the hashed > key, > >> we > >> >> have considered ourselves to be saved enough for a normal usage. > >> >> > >> >> 3) That’s the worst point. I cannot answer about the first decision, > I > >> >> wasn’t yet involve, but regarding the changes introduced in 4.0, a > >> change > >> >> had been considered. The ids are only there to satisfy Hibernate and > its > >> >> loading mechanism. If we had used a counter, we had to manage a > >> conversion > >> >> table between ids and entity references with all the additional > >> complexity > >> >> (consistency issues, caching, ...). This is so because we use entity > >> >> reference to point directly to document (or even objects) everywhere > in > >> >> XWiki. This would have been a huge work to introduce that behaviour > and > >> at > >> >> the same time keeping all the existing API unchanged. It would > probably > >> >> have introduced a performance penalty as well. This is why we > resigned > >> and > >> >> go for an improved hash solution. IMO, if we had to make such a > change, > >> we > >> >> are even better rewriting the storage service completely, and even > stop > >> >> using Hibernate, which, to be honest, does not bring much benefit to > >> >> XWiki with its ORM aspects. > >> >> > >> >> But if you really want the complete answers, you can look at those > >> threads: > >> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f > >> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx > >> >> > >> >> Regards, > >> >> > >> >> -- > >> >> Denis Gervalle > >> >> SOFTEC sa - CEO > >> >> > >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <[email protected] > >, > >> >> wrote: > >> >> > Dear XWiki devs > >> >> > > >> >> > We are using the XWiki platform for our applications but sadly are > >> still > >> >> > stuck with 2.7.2. Lately we ran into issues on a large database and > >> >> noticed > >> >> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990 > >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs > >> collided > >> >> > (hash collisions) and overwrote other objects without any trace - > >> neither > >> >> > visible in the history nor in a log file. > >> >> > > >> >> > We analysed your implemented solution from 4.0+ in XWikiDocument > >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x > >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/ > >> >> xpn/xwiki/doc/XWikiDocument.java#L841 > >> >> > and BaseElement > >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x > >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/ > >> >> xpn/xwiki/objects/BaseElement.java#L237 > >> >> > and > >> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5, > which > >> >> > makes a collision less likely. I have a few questions regarding > your > >> >> > solution: > >> >> > > >> >> > 1) Is there any specific reason why you have chosen MD5 over SHA-1 > or > >> 2? > >> >> > > >> >> > 2) Collisions are still possible and would be extremely hard to > notice > >> >> > since they are completely silent. Have you considered to implement > a > >> >> > collision detection to at least log occurring collisions - or even > >> better > >> >> > reserve 1-2bits of the 64bit to be used as collision counter in the > >> case > >> >> of > >> >> > it happening? > >> >> > > >> >> > 3) To question the concept of generating a hash for an ID in > general: > >> >> > Wouldn't a database defined "auto increment" be a much more robust > >> >> solution > >> >> > for the hibernate IDs? A collision would be impossible and > >> >> > clustering/scalability is still possible with e.g. the InnoDB > >> >> “interleaved” > >> >> > autoincrement lock mode. Why have you chosen a hash based solution > in > >> the > >> >> > first place? > >> >> > > >> >> > I'm sorry if these questions were already answered in the dev > mailing > >> >> list > >> >> > or on issues, please link me to them since I couldn't find any > >> concrete > >> >> > answers. > >> >> > > >> >> > Thanks for your time and regards > >> >> > > >> >> > Marc Sladek > >> >> > synventis gmbh > >> >> > >> > >> > >> > >> -- > >> Thomas Mortagne > >> > > > > -- > Thomas Mortagne >

