Re: Ode Performance: Round I

Matthieu Riou Wed, 06 Jun 2007 15:19:06 -0700

Forwarding on behalf of Maciej (mistakingly replied privately):

-----------------------------------------------------------------------------------------------------------------

ah yes. ok, here's my theory on getting the behavior alex wants; this
i think is a fairly concrete way to get the different use cases we
outlined on the white board.

1) create the notion of an invocation style: BLOCKING, ASYNC,
RELIABLE, and TRANSACTED.
2) add MessageExchangeContext.isStyleSupported(PartnerMex, Style) method
3) modify the MessageExchangeContext.invokePartner method to take a
style parameter.

In BLOCKING style the IL simply does the invoke, right then and there,
blocking the thread. (our axis IL would support this style)

In ASYNC style, the IL does not block; instead it sends us a
notification when the response is available. (JBI likes this style the
most).

In RELIABLE, the request would be enrolled in the current TX, response
delievered asynch as above (in a new tx)

in TRANSACTED, the behavior is like BLOCKING, but the TX context is
propagted with the invocation.

The engine would try to use the best style given the circumstances.
For example, for in-mem processes it would prefer to use the
TRANSACTED style and it could do it "in-line", i.e. as part of the
<invoke> or right after it runs out of reductions. If the style is
not supported it could 'downgrade' to the BLOCKING style, which would
work in the same way. If BLOCKING were not supported, then ASYNC would
be the last resort, but this would force us to serialize.

For persisted processes, we'd prefer RELIABLE in general, TRANSACTED
when inside an atomic scope, otherwise either BLOCKING or ASYNC.
However, here use of BLOCKING or ASYNC would result in additional
transactions since we'd need to persist the fact that the invocation
was made. Unless of course the operation is marked as "idempotent" in
which case we could use the BLOCKING call without a checkpoint.

How does that sound?
-mbs

On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:


Actually for in-memory processes, it would save us all reads and writes
(we should never read or write it in that case). And for persistent
processes, then it will save a lot of reads (which are still expensive
because of deserialization).

On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
>
> Two things:
>
> 1. We should also consider caching the Jacob state. Instead of always
> serializing / writing and reading / deserializing, caching those states
> could save us a lot of reads.
>
> 2. Cutting down the transaction count is a significant refactoring so I
> would start a new branch for that (maybe ODE 2.0?). And we're going to
> need a lot of tests to chase regressions :)
>
> I think 1 could go without a branch. It's not trivial but I don't think
> it would take more than a couple of weeks (I would have to get deeper into
> the code to give a better evaluation).
>
> On 6/6/07, Alex Boisvert < [EMAIL PROTECTED]> wrote:
> >
> > Howza,
> >
> > I started testing a short-lived process implementing a single
> > request-response operation.  The process structure is as follows:
> >
> > -Receive Purchase Order
> > -Do some assignments (schema mappings)
> > -Invoke CRM system to record the new PO
> > -Do more assignments (schema mappings)
> > -Invoke ERP system to record a new work order
> > -Send back an acknowledgment
> >
> > Some deployment notes:
> > -All WS operations are SOAP/HTTP
> > -The process is deployed as "in-memory"
> > -The CRM and ERP systems are mocked as Axis2 services (as dumb as can
> > be to
> > avoid bottlenecks)
> >
> > After fixing a few minor issues (to handle the load), and fixing a few
> >
> > obvious code inefficiencies which gave us roughly a 20% gain, we are
> > now
> > near-100% CPU utilization.  (I'm testing on my dual-core system)   As
> > it
> > stands, Ode clocks about 70 transactions per second.
> >
> > Is this good?  I'd say there's room for improvement.  Based on
> > previous work
> > in the field, I estimate we could get up to 300-400
> > transactions/second.
> >
> > How do we improve this?  Well, looking at the end-to-end execution of
> > the
> > process, I counted 4 thread-switches and 4 JTA transactions.  Those
> > are not
> > really necessary, if you ask me.  I think significant improvements
> > could be
> > made if we could run this process straight-through, meaning in a
> > single
> > thread and a single transaction.  (Not to mention it would make things
> >
> > easier to monitor and measure ;)
> >
> > Also, to give you an idea, the top 3 areas where we spend most of our
> > CPU
> > today are:
> >
> > 1) Serialization/deserialization of the Jacob state (I'm evaluating
> > about
> > 40-50%)
> > 2) XML marshaling/unmarshaling (About 10-20%)
> > 3) XML processing:  XPath evaluation + assignments (About 10-20%)
> >
> > (The rest would be about 20%; I need to load up JProbe or DTrace to
> > provide
> > more accurate measurements.  My current estimates are a mix of
> > non-scientific statistical sampling of thread dumps and a quick run
> > with the
> > JVM's built-in profiler)
> >
> > So my general question is...  how do we get started on the single
> > thread +
> > single transaction refactoring?    Anybody already gave some thoughts
> > to
> > this?  Are there any pending design issues before we start?  How do we
> > work
> > on this without disrupting other parts of the system?  Do we start a
> > new
> > branch?
> >
> > alex
> >
>
>

Re: Ode Performance: Round I

Reply via email to