Re: [xwiki-devs] [proposal] Introduce a new persistence engine.

Caleb James DeLisle Thu, 20 Jan 2011 09:51:06 -0800


On 01/20/2011 12:33 PM, Denis Gervalle wrote:
> On Thu, Jan 20, 2011 at 17:15, Caleb James DeLisle <[email protected]
>> wrote:
> 
>>
>>
>> On 01/20/2011 02:42 AM, Denis Gervalle wrote:
>>> On Wed, Jan 19, 2011 at 19:54, Caleb James DeLisle <
>> [email protected]
>>>> wrote:
>>>
>>>>
>>>>
>>>> On 01/19/2011 12:13 PM, Vincent Massol wrote:
>>>>> Hi Caleb,
>>>>>
>>>>> I see you're excited, that's good! :)
>>>>>
>>>>> Some general comments:
>>>>> * This looks more like a design for a transaction module than for a
>>>> persistence engine. I don't see anything related to persistence in your
>>>> proposal below. Your proposal could work on stuff others than storage,
>>>> right?
>>>> Yes, this proposal only covers the transaction sub-module of the
>>>> persistence engine. The so far
>>>> un-proposed modules include xwiki-store-serialization,
>>>> xwiki-store-filesystem and a legacy
>>>> attachment storage module: xwiki-store-filesystem-attachments.
>>>>
>>>> Anything which requires transactions could use the TransactionRunnable
>>>> although I'm at a loss to
>>>> think of anything other than storage which would require transactions.
>>>>
>>>>> * I was expecting to see some Store/Storage/Persistence interfaces with
>>>> proposed APIs and explanation on how they could be implemented both with
>>>> Hibernate and JCR for example. And the relationship with the proposed
>> new
>>>> Model defined.
>>>> I don't like to propose an interface until I have tried to implement it.
>>>> Also I do not like to
>>>> propose an implementation until I have tried to use it. At this point
>> it's
>>>> far enough off that I
>>>> would rather wait than propose APIs blind.
>>>>
>>>> My experience with attachment store has shown that what we want is a set
>> of
>>>> functions which provide
>>>> TransactionRunnables to do various things:
>>>> aka:
>>>> TransactionRunnable<T> getDocumentSaveTransactionRunnable(XWikiDocument
>>>> toSave);
>>>> In a hibernate implementation it would return
>>>> TransactionRunnable<HibernateTransaction> and in a JCR
>>>> it would return TransactionRunnable<JCRTransaction>.
>>>>
>>>> We cannot have APIs like this until TransactionRunnable is agreed upon
>>>> these will return instances
>>>> of it.
>>>>
>>>>> * I was also expecting a strategy defined to migrate users from the
>>>> current implementation of the storage to the new one
>>>> IMO we should change the persistence engine and implement the same
>> schema,
>>>> once the persistence
>>>> engine is rebuilt, then we can consider modifying the schema. The schema
>> is
>>>> a specification, it may
>>>> not be perfect but it is something to comply with. It is important to me
>>>> that a we prove that a new
>>>> persistence engine is able comply with existing specifications before we
>>>> start designing new ones
>>>> around it.
>>>>
>>>>>
>>>>> I noticed some discussions between Denis and you on IRC about all this.
>>>> Does you latest findings change the proposal below?
>>>> Everything proposed still holds true but I did add 2 new features.
>>>>
>>>> 1. There is a way for a TransactionRunnable<DBTransaction> to be passed
>> an
>>>> instance of DBTransaction
>>>> using a new method called getContext().
>>>>
>>>> 2. There is a new class which serves what I believe is an edge use case.
>>>> Suppose you want to define a TransactionRunnable (we will call it
>>>> YourTransactionRunnable) which
>>>> must run inside of a DBTransaction but it must _also_ run after an
>> instance
>>>> of MyTransactionRunnable.
>>>> You can make MyTransactionRunnable a
>>>> "ProvidingTransactionRunnable<DBTransaction, MyInterface>" and
>>>> then MyTransactionRunnable must run inside of a DBTransaction and we
>> define
>>>> YourTransactionRunnable
>>>> as a TransactionRunnable<MyInterface>. This also allows
>>>> MyTransactionRunnable to share information
>>>> since YourTransactionRunnable.getContext() will provide an
>> implementation
>>>> of MyInterface. Of course
>>>> this feature must be used with care as it provides the tools to write
>>>> horrible constructs but IMO it
>>>> is the type of feature which when you need it, there is no other way
>>>> around.
>>>>
>>>
>>> I just need to add that we have not been able to apply TR and PTR in
>>> particular to a simple store with a JDBC connected database. From our
>>> experience, the PTR cause more issue than it solve and we have to find a
>>> better way to convey datas between transactions of a given transaction
>>> chain. Data dependencies between transaction could not be easily solved
>> at
>>> compile time since there is many combinatory situation in real life.
>>>
>>> My suggestion would be to provide access to previous transaction using a
>>> hash of Interfaces exposed by previous transaction. Checking availability
>> of
>>> needed interfaces could be done by transactions à preRun time which could
>>> avoid the need of uselessly running the whole chain, if there is a
>>> dependency problem.
>>
>> What you can do is make DBTransaction (the context) extend Map and then put
>> things in that map. I
>> really don't like that path though since is means that TR3 uses things
>> provided by TR1 and doesn't
>> declare that need. It breaks atomicity by sharing data between
>> transactions, TR3 can't just be
>> plugged in somewhere else where TR1 is not used. It also breaks breaks
>> compile time safety, plugging
>> TR3 in somewhere else (or reordering them) will compile but fail to run.
>> That pattern doesn't really afford you any of the benefits of TR other than
>> the try, catch safety
>> so, to me, it makes sense in that case for the code for TR1, TR2, and TR3
>> to all be inside of the
>> same TransactionRunnable.
>>
>>>
>>> I also doubt that the way of multiple transactions are executed in a same
>>> single one is correct or useful. Would really prefer to see all
>> transactions
>>> run at a given level before going down the chain.
>>
>> I don't see this as making sense. My goal is to make sure when something
>> goes wrong it fails as
>> early as possible so there is the least amount to rollback.
>>
>> SaveDocumentRunnable
>>  |
>>  +--SaveObjectRunnable
>>  |   |
>>  |   +--SavePropertyRunnable EXCEPTION
>>  |   +--SavePropertyRunnable <-- Doesn't run.
>>  |
>>  +--SaveObjectRunnable <-- doesn't run
>>  |
>>  etc. Doesn't run.
>>
>> I'm interested to know what is the rationale for wanting them to run one
>> level then the next?
>> IE: save document, then all objects then all properties.
>>
>>> This would provide the way
>>> to bundle transaction (and even transaction chain) together by running
>> them
>>> in a single transaction. This would also helps not mixing dependencies
>> since
>>> currently there is an implicit availability of earlier sibling chain of
>>> transactions that does not fit the idea of a context in evolution checked
>> at
>>> compile time and could provide unchecked implicit dependencies.
>>
>> I still don't quite follow, can you give a real world example?
>>
> 
> Well, let me base my real world example on your sample above.
> Imagine that documents are linked to their objects using their primary key,
> PK which is generated by the underlying database when you records the
> document for the first time, and that we needs creating a new documents with
> Objects in a single transaction. Your SaveObjectRunnable will need that PK
> as a FK, how does it get it? Or said in another way, it needs a context that
> provide the PK of the document for which it is saving the objects.


If the object knows what document it belongs to couldn't you use:
object.getDocument().getKey() ?
I think allowing the document to know it's primary key is less ugly than 
passing everything around
in a map.
Alternatively you could use:
UPDATE object SET object.foreignKey=(SELECT doc.id where doc.name = 
<object.getDoc().getName()>) or
the like. It's harder on the DB but it will make each TR modular and atomic.

> Well, lets insert a CreateDocumentRunnable before it, that may have some
> lower TR to complete, like saving document and retrieving the generated PK.
> How does this CreateDocumentRunnable is maybe a sibling
> of SaveObjectRunnable provide the PK to it ? 

I don't understand why the SaveObjectRunnable would be a sibling and of the 
SaveDocumentRunnable. If
the Object is part of the document then shouldn't it be a child?

> How do you check
> that SaveObjectRunnable requirement to have that PK is fulfilled ? Have you
> an idea of the structure needed for this example ?

IMO siblings should never depend on one another, if the TR needs something from 
the last TR then it
really needs to be a child of it. this also has the benefit that you can pass 
around a TR for saving
a document and it will have all TRs for saving each object and each property 
inside of it.


Thank you, I really appreciate your review.

Caleb

> 
> Denis
> 
> 
>>>
>>> To conclude, this is a very interesting proposal, that needs more
>>> refinements before being used wildly in all situation requiring
>>> transactional processing.
>>
>>
>>> Nice idea and good job Caleb !
>>
>> Thanks :)
>>
>> Caleb
>>
>>>
>>> Denis
>>>
>>>
>>>
>>>>
>>>> Caleb
>>>>
>>>>>
>>>>> Thanks
>>>>> -Vincent
>>>>>
>>>>> On Jan 10, 2011, at 2:15 PM, Caleb James DeLisle wrote:
>>>>>
>>>>>> Hi,
>>>>>> I have been working hard on filesystem attachments and I found that
>>>> synchronizing manual filesystem
>>>>>> transactions with automatic database transactions was a difficult job
>>>> and I needed a new tool to do
>>>>>> it. So I wrote what I am proposing to be the next XWiki Persistence
>>>> Engine.
>>>>>>
>>>>>> I'll start off with the fun part of the proposal, I have been calling
>> it
>>>> xwiki-store so far but I am
>>>>>> so excited about the capabilities of this engine that I don't think it
>>>> does it justice to name it
>>>>>> "store" after the place on the corner with milk and eggs. I am
>> proposing
>>>> it be named "XWiki
>>>>>> Persistence Engine", the directory will be renamed xwiki-persistence,
>>>> the artifact name
>>>>>> xwiki-core-persistence, and the package name org.xwiki.persistence.
>>>> Persistence is an attribute of
>>>>>> castles, mountains and redwood trees which I think is fitting for a
>>>> conservatively designed storage
>>>>>> engine.
>>>>>>
>>>>>> Now a little explanation of what I'm so excited about:
>>>>>> The common and error prone way of saving things in the database is to
>>>> open a transaction, enter a
>>>>>> try clause, do something then commit. If we catch an exception, then
>> we
>>>> rollback.
>>>>>> something like this:
>>>>>>
>>>>>> begin transaction;
>>>>>> try {
>>>>>>   do something;
>>>>>>   do something else;
>>>>>>   commit;
>>>>>> } catch (Any exception which may occur) {
>>>>>>   rollback;
>>>>>> }
>>>>>>
>>>>>> There are 3 things which can go wrong. 1 we forget to begin the
>>>> transaction, 2 we forget to commit
>>>>>> and 3 we do not rollback properly. What makes things worse is often
>> the
>>>> database will "assume we
>>>>>> meant to..." and things will work ok most of the time which makes
>> things
>>>> much worse because bugs
>>>>>> will hide very well.
>>>>>>
>>>>>> My answer to this problem is a class called TransactionRunnable. It
>>>> provides a set of 5 empty
>>>>>> methods to override: onPreRun(), onRun(), onCommit(), onRollback(),
>> and
>>>> onComplete(). the exact
>>>>>> circumstances under which each are called is documented in the javadoc
>>>> comments here:
>>>>>>
>>>>
>> http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xwiki-store/xwiki-store-transaction/src/main/java/org/xwiki/store/TransactionRunnable.java
>>>>>> I wrote TransactionRunnable twice, I wrote it, used it for
>> attachments,
>>>> then after having real
>>>>>> experience as a user, I wrote it again.
>>>>>>
>>>>>> To repeat our original example with TransactionRunnable you might say
>>>> this:
>>>>>>
>>>>>> public class DoSomethingTransactionRunnable extends
>> TransactionRunnable
>>>>>> {
>>>>>>   public void onRun()
>>>>>>   {
>>>>>>     do something;
>>>>>>     do something else;
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> Now we can use another TransactionRunnable which opens and closes the
>>>> transaction for us.
>>>>>>
>>>>>> StartableTransactionRunnable transaction = new
>>>> HibernateTransactionRunnable();
>>>>>> new DoSomethingTransactionRunnable().runIn(transaction);
>>>>>> transaction.start();
>>>>>>
>>>>>> the runIn() function allows us to run one TransactionRunnable inside
>> of
>>>> another. Supposing we wanted
>>>>>> to reuse "do something else" in other places, we can make it a
>> separate
>>>> TransactionRunnable and use
>>>>>> the runIn() function to hook it to our DoSomethingTransactionRunnable
>>>> ie:
>>>>>>
>>>>>> public class DoSomethingTransactionRunnable extends
>> TransactionRunnable
>>>>>> {
>>>>>>   public DoSomethingTransactionRunnable()
>>>>>>   {
>>>>>>     new DoSomethingElseTransactionRunnable().runIn(this);
>>>>>>   }
>>>>>> ..
>>>>>>
>>>>>> The only limitations on running TransactionRunnables inside of one
>>>> another are they cannot run more
>>>>>> than once and they cannot call themselves (this would be an infinite
>>>> loop).
>>>>>>
>>>>>> This pattern makes each job which is done on storage easily isolated
>>>> and, as I have so far
>>>>>> experienced, trivial to test. However, it still leaves the possibility
>>>> that we might forget that
>>>>>> DoSomethingTransactionRunnable must be run inside of a hibernate
>>>> transaction. I have devised a
>>>>>> solution for this too. Using generics, I offered a means for the
>> author
>>>> of a TransactionRunnable to
>>>>>> communicate to the compiler what other TransactionRunnable their
>>>> runnable must be run in and without
>>>>>> explicit casting or defining of an intermediary runnable, this
>>>> requirement cannot be violated or
>>>>>> else it wouldn't compile!
>>>>>>
>>>>>> Finally we have the issue of starting the runnable. Who's to say I
>> won't
>>>> be tired one day and just
>>>>>> write new DoSomethingTransactionRunnable().start() without opening a
>>>> transaction first? If
>>>>>> DoSomethingTransactionRunnable cannot be safely run outside of a
>>>> transaction all it needs to do is
>>>>>> not extend StartableTransactionRunnable and it won't have any start
>>>> function.
>>>>>>
>>>>>> I have taken a multitude of very easy mistakes and given the author of
>> a
>>>> TransactionRunnable the
>>>>>> tools to make it very hard for the user to make them. Also, since a
>>>> TransactionRunnable has no
>>>>>> reason to be very long (it can just branch off into another runnable)
>>>> this will make testing and
>>>>>> code review easy in the place where it is most important. This part of
>>>> the code is entirely generic
>>>>>> and has no dependence on hibernate or anything else.
>>>>>>
>>>>>> I propose we move:
>>>>>> contrib/sandbox/xwiki-store/xwiki-store-transaction/
>>>>>> to:
>>>>>> platform/core/xwiki-persistence/xwiki-persistence-transaction
>>>>>>
>>>>>> And I will propose moving each additional piece in the coming days.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> Caleb
>>>>>
>>>>> _______________________________________________
>>>>> devs mailing list
>>>>> [email protected]
>>>>> http://lists.xwiki.org/mailman/listinfo/devs
>>>>>
>>>>
>>>> _______________________________________________
>>>> devs mailing list
>>>> [email protected]
>>>> http://lists.xwiki.org/mailman/listinfo/devs
>>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>>
> 
> 
> 

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] [proposal] Introduce a new persistence engine.

Reply via email to