Re: Ode Performance: Round I

Maciej Szefler Fri, 08 Jun 2007 08:55:58 -0700

sure..


On 6/7/07, Alex Boisvert <[EMAIL PROTECTED]> wrote:

Ok, got it.   Do you want to go ahead and create the "straight-through"
branch?

alex


On 6/7/07, Maciej Szefler <[EMAIL PROTECTED]> wrote:
>
> If the IL supports ASYNC, then it is used, otherwise BLOCKING would be
> used. We want to keep this, because if the IL does indeed use ASYNC
> style (for example if this is a JMS ESB), then likely we don't have
> much in the way of performance guarantees, i.e. the thread may end up
> being blocked for a day, which would quickly lead to resource
> problems.
>
> -mbs
>
> On 6/6/07, Alex Boisvert <[EMAIL PROTECTED]> wrote:
> > Maciej,
> >
> > I'm unclear about how the engine would choose between BLOCKING and
> ASYNC.
> >
> > I tend to think we need only BLOCKING and the IL deals with the fact
> that it
> > might have to suspend and resume itself if the underlying invocation is
> > async (e.g. JBI).   What's the use-case for ASYNC?
> >
> > alex
> >
> > On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
> > >
> > > Forwarding on behalf of Maciej (mistakingly replied privately):
> > >
> > >
> > >
> 
-----------------------------------------------------------------------------------------------------------------
> > >
> > > ah yes. ok, here's my theory on getting the behavior alex wants; this
> > > i think is a fairly concrete way to get the different use cases we
> > > outlined on the white board.
> > >
> > > 1) create the notion of an invocation style: BLOCKING, ASYNC,
> > > RELIABLE, and TRANSACTED.
> > > 2) add MessageExchangeContext.isStyleSupported(PartnerMex, Style)
> method
> > > 3) modify the MessageExchangeContext.invokePartner method to take a
> > > style parameter.
> > >
> > > In BLOCKING style the IL simply does the invoke, right then and there,
> > > blocking the thread. (our axis IL would support this style)
> > >
> > > In ASYNC style, the IL does not block; instead it sends us a
> > > notification when the response is available. (JBI likes this style the
> > > most).
> > >
> > > In RELIABLE, the request would be enrolled in the current TX, response
> > > delievered asynch as above (in a new tx)
> > >
> > > in TRANSACTED, the behavior is like BLOCKING, but the TX context is
> > > propagted with the invocation.
> > >
> > > The engine would try to use the best style given the circumstances.
> > > For example, for in-mem processes it would prefer to use the
> > > TRANSACTED style and it could do it "in-line", i.e. as part of the
> > > <invoke> or right after it runs out of reductions.  If the style is
> > > not supported it could 'downgrade' to the BLOCKING style, which would
> > > work in the same way. If BLOCKING were not supported, then ASYNC would
> > > be the last resort, but this would force us to serialize.
> > >
> > > For persisted processes, we'd prefer RELIABLE in general, TRANSACTED
> > > when inside an atomic scope, otherwise either BLOCKING or ASYNC.
> > > However, here use of BLOCKING or ASYNC would result in additional
> > > transactions since we'd need to persist the fact that the invocation
> > > was made. Unless of course the operation is marked as "idempotent" in
> > > which case we could use the BLOCKING call without a checkpoint.
> > >
> > > How does that sound?
> > > -mbs
> > >
> > >
> > > On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Actually for in-memory processes, it would save us all reads and
> writes
> > > > (we should never read or write it in that case). And for persistent
> > > > processes, then it will save a lot of reads (which are still
> expensive
> > > > because of deserialization).
> > > >
> > > > On 6/6/07, Matthieu Riou <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > Two things:
> > > > >
> > > > > 1. We should also consider caching the Jacob state. Instead of
> always
> > > > > serializing / writing and reading / deserializing, caching those
> > > states
> > > > > could save us a lot of reads.
> > > > >
> > > > > 2. Cutting down the transaction count is a significant refactoring
> so
> > > I
> > > > > would start a new branch for that (maybe ODE 2.0?). And we're
> going to
> > > > > need a lot of tests to chase regressions :)
> > > > >
> > > > > I think 1 could go without a branch. It's not trivial but I don't
> > > think
> > > > > it would take more than a couple of weeks (I would have to get
> deeper
> > > into
> > > > > the code to give a better evaluation).
> > > > >
> > > > > On 6/6/07, Alex Boisvert < [EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > Howza,
> > > > > >
> > > > > > I started testing a short-lived process implementing a single
> > > > > > request-response operation.  The process structure is as
> follows:
> > > > > >
> > > > > > -Receive Purchase Order
> > > > > > -Do some assignments (schema mappings)
> > > > > > -Invoke CRM system to record the new PO
> > > > > > -Do more assignments (schema mappings)
> > > > > > -Invoke ERP system to record a new work order
> > > > > > -Send back an acknowledgment
> > > > > >
> > > > > > Some deployment notes:
> > > > > > -All WS operations are SOAP/HTTP
> > > > > > -The process is deployed as "in-memory"
> > > > > > -The CRM and ERP systems are mocked as Axis2 services (as dumb
> as
> > > can
> > > > > > be to
> > > > > > avoid bottlenecks)
> > > > > >
> > > > > > After fixing a few minor issues (to handle the load), and fixing
> a
> > > few
> > > > > >
> > > > > > obvious code inefficiencies which gave us roughly a 20% gain, we
> are
> > > > > > now
> > > > > > near-100% CPU utilization.  (I'm testing on my dual-core system)
> > > As
> > > > > > it
> > > > > > stands, Ode clocks about 70 transactions per second.
> > > > > >
> > > > > > Is this good?  I'd say there's room for improvement.  Based on
> > > > > > previous work
> > > > > > in the field, I estimate we could get up to 300-400
> > > > > > transactions/second.
> > > > > >
> > > > > > How do we improve this?  Well, looking at the end-to-end
> execution
> > > of
> > > > > > the
> > > > > > process, I counted 4 thread-switches and 4 JTA
> transactions.  Those
> > > > > > are not
> > > > > > really necessary, if you ask me.  I think significant
> improvements
> > > > > > could be
> > > > > > made if we could run this process straight-through, meaning in a
> > > > > > single
> > > > > > thread and a single transaction.  (Not to mention it would make
> > > things
> > > > > >
> > > > > > easier to monitor and measure ;)
> > > > > >
> > > > > > Also, to give you an idea, the top 3 areas where we spend most
> of
> > > our
> > > > > > CPU
> > > > > > today are:
> > > > > >
> > > > > > 1) Serialization/deserialization of the Jacob state (I'm
> evaluating
> > > > > > about
> > > > > > 40-50%)
> > > > > > 2) XML marshaling/unmarshaling (About 10-20%)
> > > > > > 3) XML processing:  XPath evaluation + assignments (About
> 10-20%)
> > > > > >
> > > > > > (The rest would be about 20%; I need to load up JProbe or DTrace
> to
> > > > > > provide
> > > > > > more accurate measurements.  My current estimates are a mix of
> > > > > > non-scientific statistical sampling of thread dumps and a quick
> run
> > > > > > with the
> > > > > > JVM's built-in profiler)
> > > > > >
> > > > > > So my general question is...  how do we get started on the
> single
> > > > > > thread +
> > > > > > single transaction refactoring?    Anybody already gave some
> > > thoughts
> > > > > > to
> > > > > > this?  Are there any pending design issues before we start?  How
> do
> > > we
> > > > > > work
> > > > > > on this without disrupting other parts of the system?  Do we
> start a
> > > > > > new
> > > > > > branch?
> > > > > >
> > > > > > alex
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Ode Performance: Round I

Reply via email to