Re: Ode Performance: Round I

Maciej Szefler Fri, 08 Jun 2007 12:00:17 -0700

That strikes me addressing the issue at the wrong level in the
code---if we wants things to happen in one thread, then the engine
should just do them in one thread, i.e. not call scheduler until it
has given up on the thread. Introducing a new concept (work queue)
that is shared between the engine and integration layer would be
confusing... its bad enough that the IL uses the scheduler, which it
really should not.


-mbs

On 6/8/07, Alex Boisvert <[EMAIL PROTECTED]> wrote:

As a first step, I was thinking of allowing the composition of work that is
currently done in several unrelated threads into a single thread, by
introducing a WorkQueue

Right now we have code in the engine, such as
org.apache.ode.axis2.ExternalService.invoke() -> afterCompletion() that uses
ExecutorService.submit(...) and I'd like to convert this into
WorkQueue.submit().

For example, this means that org.apache.ode.axis2.OdeService would first
execute the transaction around odeMex.invoke() and after commit it would
dequeue and execute any pending items in the WorkQueue.  We would also need
to do the same in BpelEngineImpl.onScheduledJob() and other similar engine
entrypoints.

The outcome of this is that we could execute all the "non-blocking" work
related to an external event in a single thread, if desired.   Depending on
the WorkQueue implementation, we could have pure serial processing, parallel
processing (like now), or even a mix in-between (e.g. limiting concurrent
processing to N threads for a given instance).   This would allow for
optimizing response time or throughput based on the engine policy, or if we
want to get sophisticated, by process model.

I think this change is relatively straightforward that it could happen in
the trunk without disrupting it.

Thoughts?

alex

On 6/8/07, Maciej Szefler <[EMAIL PROTECTED]> wrote:
>
> sure..
>
>
> On 6/7/07, Alex Boisvert <[EMAIL PROTECTED]> wrote:
> > Ok, got it.   Do you want to go ahead and create the "straight-through"
> > branch?
> >
> > alex
> >
> >
> > On 6/7/07, Maciej Szefler <[EMAIL PROTECTED]> wrote:
> > >
> > > If the IL supports ASYNC, then it is used, otherwise BLOCKING would be
> > > used. We want to keep this, because if the IL does indeed use ASYNC
> > > style (for example if this is a JMS ESB), then likely we don't have
> > > much in the way of performance guarantees, i.e. the thread may end up
> > > being blocked for a day, which would quickly lead to resource
> > > problems.
> > >
> > > -mbs
> > >
> > > On 6/6/07, Alex Boisvert <[EMAIL PROTECTED]> wrote:
> > > > Maciej,
> > > >
> > > > I'm unclear about how the engine would choose between BLOCKING and
> > > ASYNC.
> > > >
> > > > I tend to think we need only BLOCKING and the IL deals with the fact
> > > that it
> > > > might have to suspend and resume itself if the underlying invocation
> is
> > > > async (e.g. JBI).   What's the use-case for ASYNC?
> > > >
> > > > alex
> > > >
> > > > On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Forwarding on behalf of Maciej (mistakingly replied privately):
> > > > >
> > > > >
> > > > >
> > >
> 
-----------------------------------------------------------------------------------------------------------------
> > > > >
> > > > > ah yes. ok, here's my theory on getting the behavior alex wants;
> this
> > > > > i think is a fairly concrete way to get the different use cases we
> > > > > outlined on the white board.
> > > > >
> > > > > 1) create the notion of an invocation style: BLOCKING, ASYNC,
> > > > > RELIABLE, and TRANSACTED.
> > > > > 2) add MessageExchangeContext.isStyleSupported(PartnerMex, Style)
> > > method
> > > > > 3) modify the MessageExchangeContext.invokePartner method to take
> a
> > > > > style parameter.
> > > > >
> > > > > In BLOCKING style the IL simply does the invoke, right then and
> there,
> > > > > blocking the thread. (our axis IL would support this style)
> > > > >
> > > > > In ASYNC style, the IL does not block; instead it sends us a
> > > > > notification when the response is available. (JBI likes this style
> the
> > > > > most).
> > > > >
> > > > > In RELIABLE, the request would be enrolled in the current TX,
> response
> > > > > delievered asynch as above (in a new tx)
> > > > >
> > > > > in TRANSACTED, the behavior is like BLOCKING, but the TX context
> is
> > > > > propagted with the invocation.
> > > > >
> > > > > The engine would try to use the best style given the
> circumstances.
> > > > > For example, for in-mem processes it would prefer to use the
> > > > > TRANSACTED style and it could do it "in-line", i.e. as part of the
> > > > > <invoke> or right after it runs out of reductions.  If the style
> is
> > > > > not supported it could 'downgrade' to the BLOCKING style, which
> would
> > > > > work in the same way. If BLOCKING were not supported, then ASYNC
> would
> > > > > be the last resort, but this would force us to serialize.
> > > > >
> > > > > For persisted processes, we'd prefer RELIABLE in general,
> TRANSACTED
> > > > > when inside an atomic scope, otherwise either BLOCKING or ASYNC.
> > > > > However, here use of BLOCKING or ASYNC would result in additional
> > > > > transactions since we'd need to persist the fact that the
> invocation
> > > > > was made. Unless of course the operation is marked as "idempotent"
> in
> > > > > which case we could use the BLOCKING call without a checkpoint.
> > > > >
> > > > > How does that sound?
> > > > > -mbs
> > > > >
> > > > >
> > > > > On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > Actually for in-memory processes, it would save us all reads and
> > > writes
> > > > > > (we should never read or write it in that case). And for
> persistent
> > > > > > processes, then it will save a lot of reads (which are still
> > > expensive
> > > > > > because of deserialization).
> > > > > >
> > > > > > On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
> > > > > > >
> > > > > > > Two things:
> > > > > > >
> > > > > > > 1. We should also consider caching the Jacob state. Instead of
> > > always
> > > > > > > serializing / writing and reading / deserializing, caching
> those
> > > > > states
> > > > > > > could save us a lot of reads.
> > > > > > >
> > > > > > > 2. Cutting down the transaction count is a significant
> refactoring
> > > so
> > > > > I
> > > > > > > would start a new branch for that (maybe ODE 2.0?). And we're
> > > going to
> > > > > > > need a lot of tests to chase regressions :)
> > > > > > >
> > > > > > > I think 1 could go without a branch. It's not trivial but I
> don't
> > > > > think
> > > > > > > it would take more than a couple of weeks (I would have to get
> > > deeper
> > > > > into
> > > > > > > the code to give a better evaluation).
> > > > > > >
> > > > > > > On 6/6/07, Alex Boisvert < [EMAIL PROTECTED]> wrote:
> > > > > > > >
> > > > > > > > Howza,
> > > > > > > >
> > > > > > > > I started testing a short-lived process implementing a
> single
> > > > > > > > request-response operation.  The process structure is as
> > > follows:
> > > > > > > >
> > > > > > > > -Receive Purchase Order
> > > > > > > > -Do some assignments (schema mappings)
> > > > > > > > -Invoke CRM system to record the new PO
> > > > > > > > -Do more assignments (schema mappings)
> > > > > > > > -Invoke ERP system to record a new work order
> > > > > > > > -Send back an acknowledgment
> > > > > > > >
> > > > > > > > Some deployment notes:
> > > > > > > > -All WS operations are SOAP/HTTP
> > > > > > > > -The process is deployed as "in-memory"
> > > > > > > > -The CRM and ERP systems are mocked as Axis2 services (as
> dumb
> > > as
> > > > > can
> > > > > > > > be to
> > > > > > > > avoid bottlenecks)
> > > > > > > >
> > > > > > > > After fixing a few minor issues (to handle the load), and
> fixing
> > > a
> > > > > few
> > > > > > > >
> > > > > > > > obvious code inefficiencies which gave us roughly a 20%
> gain, we
> > > are
> > > > > > > > now
> > > > > > > > near-100% CPU utilization.  (I'm testing on my dual-core
> system)
> > > > > As
> > > > > > > > it
> > > > > > > > stands, Ode clocks about 70 transactions per second.
> > > > > > > >
> > > > > > > > Is this good?  I'd say there's room for improvement.  Based
> on
> > > > > > > > previous work
> > > > > > > > in the field, I estimate we could get up to 300-400
> > > > > > > > transactions/second.
> > > > > > > >
> > > > > > > > How do we improve this?  Well, looking at the end-to-end
> > > execution
> > > > > of
> > > > > > > > the
> > > > > > > > process, I counted 4 thread-switches and 4 JTA
> > > transactions.  Those
> > > > > > > > are not
> > > > > > > > really necessary, if you ask me.  I think significant
> > > improvements
> > > > > > > > could be
> > > > > > > > made if we could run this process straight-through, meaning
> in a
> > > > > > > > single
> > > > > > > > thread and a single transaction.  (Not to mention it would
> make
> > > > > things
> > > > > > > >
> > > > > > > > easier to monitor and measure ;)
> > > > > > > >
> > > > > > > > Also, to give you an idea, the top 3 areas where we spend
> most
> > > of
> > > > > our
> > > > > > > > CPU
> > > > > > > > today are:
> > > > > > > >
> > > > > > > > 1) Serialization/deserialization of the Jacob state (I'm
> > > evaluating
> > > > > > > > about
> > > > > > > > 40-50%)
> > > > > > > > 2) XML marshaling/unmarshaling (About 10-20%)
> > > > > > > > 3) XML processing:  XPath evaluation + assignments (About
> > > 10-20%)
> > > > > > > >
> > > > > > > > (The rest would be about 20%; I need to load up JProbe or
> DTrace
> > > to
> > > > > > > > provide
> > > > > > > > more accurate measurements.  My current estimates are a mix
> of
> > > > > > > > non-scientific statistical sampling of thread dumps and a
> quick
> > > run
> > > > > > > > with the
> > > > > > > > JVM's built-in profiler)
> > > > > > > >
> > > > > > > > So my general question is...  how do we get started on the
> > > single
> > > > > > > > thread +
> > > > > > > > single transaction refactoring?    Anybody already gave some
> > > > > thoughts
> > > > > > > > to
> > > > > > > > this?  Are there any pending design issues before we
> start?  How
> > > do
> > > > > we
> > > > > > > > work
> > > > > > > > on this without disrupting other parts of the system?  Do we
> > > start a
> > > > > > > > new
> > > > > > > > branch?
> > > > > > > >
> > > > > > > > alex
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Ode Performance: Round I

Reply via email to