Re: Discussion: Entity Engine Redesign

Adrian Crum Wed, 27 Aug 2014 04:21:13 -0700

Yes, I am referring to separate transactions using suspend/resume. Ithought I mentioned that. I apologize for the confusion.


Behavioral Changes
------------------


Current (simplified for clarity):

Transaction suspendedTransaction = null;
try {
  suspendedTransaction = TransactionUtil.suspend();
  boolean beganTransaction = false;
  try {
    beganTransaction = TransactionUtil.begin();
    ...
    TransactionUtil.commit(beganTransaction);
  } catch (Exception e) {
    TransactionUtil.rollback(beganTransaction, "Exception thrown", e);
    ...
  }
} finally {
  if (suspendedTransaction != null) {
    TransactionUtil.resume(suspendedTransaction);
  }
}
...
TransactionUtil.commit(); // Commit previously suspended Tx


Proposed (simplified for clarity):

Transaction newTransaction = TransactionFactory.newTransaction();
try {

Delegator newDelegator =newTransaction.getDelegator(delegator.getDelegatorName());

  // Do stuff with newDelegator
  ...
  newTransaction.commit();
} catch (Exception e) {
  newTransaction.rollback("Exception thrown", e);
  ...
}
...
Transaction originalTransaction = delegator.getTransaction();
originalTransaction.commit();

As you can see, there is no need to "suspend" or "resume" a transaction.If you want a new transaction, you just create one. All of thatcomplicated suspend and resume code is due to the awful ThreadLocaltransaction stack implementation.



I will get back to you on the solution to the dirty cache.


Adrian Crum
Sandglass Software
www.sandglass-software.com

On 8/27/2014 11:17 AM, Scott Gray wrote:

I don't have a problem with multi-threaded transactions, I'm not sure under 
what situations I would use it but I'm not against it.  I have a better 
understanding now of why you don't want to use ThreadLocal, thanks.

Regarding your code simplification, I'm not sure I understand the concept your 
proposing.  The first confusion for me is that you mention nested transactions 
but we don't currently support those in OFBiz (because Geronimo doesn't).  Are 
you referring to separate transactions using suspend/resume rather than truly 
nested transactions?  I'm going to be nervous about agreeing to anything that 
fundamentally changes the transaction lifecycle as it is now, I'm familiar with 
it and for the most part it works well.

And I still have no idea how any of this would solve the dirty cache issues.  I 
think we're in completely different spaces about the solution to the problem 
and I need you to expand on it further and perhaps tie that explanation into 
the proposed behavioural changes to the transaction lifecycle.

Where we stand:
- I get the desire for multi-threaded transactions
- I don't get the behavioural changes to the transaction lifecycle
- I don't get your solution to the dirty cache

Regards
Scott

On 27/08/2014, at 10:28 am, Adrian Crum <[email protected]> 
wrote:

Okay, let's start over from the beginning. This will be lengthy...

Around 2010 I was trying various data modelling scenarios and the "ant 
load-demo" task was taking to long to execute, slowing down progress. So, I started 
experimenting with a multi-threaded data loader. I was able to get the ant task to 
execute in 1/10th the time using multi-threading. I mentioned my experiment on the dev 
mailing list and Adam Heath got interested. He said my design was similar to SEDA.

Anyone wanting more information on SEDA can look it up, but the main point of 
that project that needs to be mentioned here is the concept of making 
multi-threading SMARTER. That confirmed a discovery I made while working on my 
multi-threaded data loader - more threads does not equal better performance. 
There was a cutoff point where more threads slowed things down.

In that dev list conversation with Adam, I mentioned the possibility of 
bringing the SEDA design concepts into OFBiz. He replied it can't be done 
because of the current transaction API (specifically, ThreadLocal variables 
tying a transaction to a thread).

Eventually, Adam brought my multi-threaded data loader idea into the project, and thanks 
to his work, my laptop creates the 800+ tables in less than a second, and "ant 
load-demo" takes a little over a minute. I think that is pretty cool and it was 
worth the effort.

But we still can't multi-thread other parts of the project - because of the use 
of ThreadLocal variables.

Since that discussion in 2010, others in the Java community are beginning to 
discover the insight SEDA delivered. Log4J 2 uses the Disruptor design pattern 
- an evolution of SEDA.

Okay, why does OFBiz use ThreadLocal variables? It's a quick-and-dirty way to 
implement an execution context - you don't have to pass a Transaction instance 
to every method call.

The funny thing is, we use the Delegator - and THAT needs to be passed to every 
method call. So why not have the Delegator reference the transaction it is 
running in and eliminate the ThreadLocal variables?

If we can pass a Delegator instance off to other threads, then we are one step 
closer to a SEDA or Disruptor design pattern for OFBiz.

Even if those design patterns aren't attractive to you, then the code 
simplification I demonstrated earlier should be.

Solving the invalid entity cache problem is a side effect, not the main 
motivator.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 8/27/2014 9:43 AM, Scott Gray wrote:

Okay whatever, thanks for wasting my time with this thread.

On 27/08/2014, at 9:39 am, Adrian Crum <[email protected]> 
wrote:

I have, but no one is paying attention. Perhaps if I worked for Hotwax, I could 
get someone to pay attention.

I'm done.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 8/27/2014 9:30 AM, Scott Gray wrote:

You're yet to describe the problem with using ThreadLocal variables.

If you read the Jira issue, I point out another problem with this clunky implementation - calling 
"commit" doesn't really commit the transaction. That is why we end up with invalid data 
in the entity cache - because developers are fooled into thinking the "commit" calls in 
Delegator code actually commit the data, but they don't.


This is the only problem you've described so far and I say it has nothing to do 
with our transaction management, it's because the cache isn't transaction 
aware.  I can't understand what changes you could be suggesting that would 
solve that problem for the entity cache?

I've read the ticket now and realize I'm just playing Adam's role.  Please 
reconsider using Synchronization or XAResource.

Regards
Scott

On 27/08/2014, at 2:01 am, Adrian Crum <[email protected]> 
wrote:

That doesn't solve the problem with using ThreadLocal variables.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 8/26/2014 11:06 PM, Scott Gray wrote:

I don't think that's an issue with our transaction handling, it's simply a 
problem that the cache isn't transaction aware.

If the cache were to be transaction aware it would need to implement the 
XAResource interface or perhaps even the simpler Synchronization interface and 
push cache entries to the global cache only upon commit or discard them on 
rollback.  I'm loathe to suggest XAResource because we don't implement it 
properly in GenericXaResource/ServiceXaWrapper/DebugXaWrapper and it breaks 
some transaction managers (Atomikos is the only one I've tried).  I have a 
strong feeling it could be implemented using the Synchronization interface 
without too much trouble though.

Regards
Scott

On 26/08/2014, at 9:46 pm, Adrian Crum <[email protected]> 
wrote:

The concepts of "suspend" and "resume" are implemented by a ThreadLocal stack. A
"suspend" pushes the current transaction on the stack, and a "resume" pops a transaction off the
stack.

If you read the Jira issue, I point out another problem with this clunky implementation - calling
"commit" doesn't really commit the transaction. That is why we end up with invalid data
in the entity cache - because developers are fooled into thinking the "commit" calls in
Delegator code actually commit the data, but they don't. The transaction is committed by the first
bit of code that began the transaction - either a request event or a service invocation.

This is an arcane problem and it is difficult to describe, but I will try to
diagram it:

Request Event
Service Dispatcher
Begin Transaction (actual begin)
Begin Service
Some service logic
Delegator calls "commit" - nothing happens
Delegator puts uncommitted values in cache
More service logic
Delegator calls "commit" - nothing happens
Delegator puts uncommitted values in cache
End Service
Commit Transaction (actual commit)
Return service results to event handler

If something goes wrong in the service and the transaction is rolled back, the
uncommitted values in the cache are still there!

You really have to spend time in Entity Engine code to fully appreciate how
awful the transaction implementation really is.

My approach keeps a Transaction reference in the Delegator. Instead of calling the fake
"commit", the Delegator notifies the Transaction about changed values. The
Transaction saves the changed values locally. After the transaction is committed, the
Transaction instance copies the saved values to the cache.

If you look at my previous code fragment, there will be no more "suspend" or
"resume" - if you want a new transaction, you just get another instance and use it.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 8/26/2014 9:02 PM, Scott Gray wrote:

Okay so I guess I don't really understand what you're suggesting, or how it 
really differs much from what we have now.  It's also not clear what your 
suggested API changes have to do with the ThreadLocal usages?

Thanks
Scott

On 26/08/2014, at 3:22 pm, Adrian Crum <[email protected]> 
wrote:

Just use the Delegator factory.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 8/26/2014 2:43 PM, Scott Gray wrote:

Hi Adrian,

I'll probably have plenty of questions, but the first that comes to mind is: 
how would you use a delegator outside of a transaction with this approach?

Thanks
Scott

On 25/08/2014, at 10:51 am, Adrian Crum <[email protected]> 
wrote:

One persistent problem with the current Entity Engine implementation is the use 
of ThreadLocal variables in the Delegator and Transactions. Their use makes it 
difficult (and sometimes impossible) to fix Entity Engine bugs. They also make 
it impossible to multi-thread a Delegator instance.

Here is what I have had percolating in my head the last few months:

Transaction tx = TransactionFactory.newTransaction();
Delegator delegator = tx.getDelegator("default");
// Do stuff with delegator
Transaction nestedTx = TransactionFactory.newTransaction();
Delegator nestedDelegator = nestedTx.getDelegator("default");
// Do stuff with nestedDelegator
nestedTx.commit();
tx.commit();

A Delegator instance always references the transaction it is running in.

The advantage to this approach is we gain the ability to hand off Delegator 
instances to other threads. Other threads can even commit/rollback a 
transaction:

Transaction tx = delegator.getTransaction();
tx.commit();

After a commit, the Delegator instance is discarded. Any attempt to use it 
after a commit throws an exception (the same is true with the Transaction 
instance).

Another problem is Delegator localization - which also uses ThreadLocal 
variables. We can localize Delegator instances like this:

Transaction tx = TransactionFactory.newTransaction();
Delegator delegator = tx.getDelegator("default", locale);

Finally, the current implementation has a caching problem: 
https://issues.apache.org/jira/browse/OFBIZ-5534

With the new design, the Delegator instance, Transaction instance, and entity 
cache are tightly coupled - so that problem is easy to solve.

What do you think?

--
Adrian Crum
Sandglass Software
www.sandglass-software.com

Re: Discussion: Entity Engine Redesign

Reply via email to