Re: Further analysis of the GC issue

Markus Kohler Thu, 26 Nov 2009 01:22:37 -0800

Thanks,
I will try that.

Markus


"The best way to predict the future is to invent it" -- Alan Kay


On Thu, Nov 26, 2009 at 10:09 AM, Richard Hirsch <[email protected]>wrote:

> @Markus It would be interesting to remove the Textile parser and do
> the tests again.
>
> This would confirm whether it is the culprit or not. If I remember
> correctly, it was just a change in one line of code.
>
> Just found the change
> (
> http://svn.apache.org/viewvc/incubator/esme/trunk/server/src/main/scala/org/apache/esme/model/Message.scala?r1=804817&r2=819509&diff_format=h
> )
> You could change the code to the older version and try it again
>
> D.
>
> On Thu, Nov 26, 2009 at 10:03 AM, Markus Kohler <[email protected]>
> wrote:
> > Hi Michael,
> > Good to see you here!
> >
> > "Memory Analyzer"? that's me ;-)
> >
> > The 23 Gbyte are not "retained" at one point in time, but they are the
> sum
> > of all temporary allocated objects, most of memory, (or all of it, there
> > doesn't seem to be an obvious memory leak), are gone within a
> millisecond.
> > I'm confident that this value can be decreased to 90Mbyte and can be
> further
> > improved down to a few MByte (or even less). We already know that the
> > 90Mbyte are mostly caused be an inefficient textile parser.
> >
> > I also used the Memory Analyzer to look at how much memory is retained,
> e.g.
> > still in use/referenced after the user interaction has been finished. The
> > report is here
> >
> http://cwiki.apache.org/confluence/display/ESME/Performance+test+-+2009-11-22
> > Also there's room for improvement, potentially caused by the same bug
> that
> > turned 90Mbyte into 23Gbyte, I don't see any major issues yet with
> regards
> > to memory usage.
> >
> > This is also related to the state less versus state full discussion, ATM
> the
> > amount of state needed for one user is already very low ( a few hundred
> > kByte), at least compared to what I'm used to with Enterprise
> Applications.
> > It is at least an order of magnitude lower, which can only partially
> > explained by ESME being less complex than the typical Enterprise app.
> > So far I don't see any major road block from the design perspective that
> > would stop us from scaling very well.
> >
> > In my experience, it's quite normal that as soon as someone with a little
> > bit of experience in performance takes as closer look at a software, that
> a
> > few dramatic improvements can be made. That makes working as a
> performance
> > analysis expert so gratifying. You suggest a few improvements, which have
> an
> > dramatic impact, and then you walk away before it gets too complicated
> ;-)
> > No, that's not my intention here :-)
> >
> >
> > Markus
> >
> > "The best way to predict the future is to invent it" -- Alan Kay
> >
> >
> > On Thu, Nov 26, 2009 at 6:04 AM, Bechauf, Michael
> > <[email protected]>wrote:
> >
> >> David,
> >>
> >> well, "dead wrong" is a strong expression; hopefully I'm still
> breathing. I
> >> don't want to judge without having looked at the code myself, but I have
> no
> >> idea how a massive multi-user system could possibly be designed with
> state
> >> where per-user information is kept in memory for a certain time. I mean,
> 23
> >> GB allocated - that's tough for an SAP transaction server that is not
> >> mutlithreaded and where the memory management is highly optimized based
> on
> >> shared memory that the work processes can attach to, or rolled out to a
> file
> >> if unused for a whilet. It is, however, deadly for a VM that was never
> >> designed for such memory consumption and where a GC run can halt the
> server.
> >>
> >> Anyway, I'll study this a bit more, particularely the Scala
> architecture. I
> >> heard many good things about Scala, but in the end it's all translated
> to
> >> things a VM can understand, and I hope Scala does a good enough job
> managing
> >> this load in a transparent way.
> >>
> >> -Michael
> >>
> >>
> >> ----- Original Message -----
> >> From: David Pollak <[email protected]>
> >> To: [email protected] <[email protected]>
> >> Sent: Wed Nov 25 23:00:20 2009
> >> Subject: Re: Further analysis of the GC issue
> >>
> >> On Wed, Nov 25, 2009 at 7:16 PM, Bechauf, Michael
> >> <[email protected]>wrote:
> >>
> >> > Wasn't this exactly the kind of stuff that the Eclipse Memory Analyzer
> -
> >> > donated by SAP - was supposed to fix ? A heap of that size for a still
> >> > moderate number of 300 users is crazy, so either there is stuff like
> >> > circular references that hog memory, or the design model is
> fundamentally
> >> > flawed. I don't understand why ESME needs "sessions" ? How can a
> >> scaleable
> >> > server be created if each user will allocate memory until some
> timeout.
> >> In a
> >> > world of stateless browser-based UIs that's not going to work.
> >> >
> >>
> >> You're actually dead wrong about this.  "Stateless" is not... it's just
> >> pushing state and cache someplace else (the RDBMS, memcached, etc.).
> >> "Stateless" will lead to radical performance problems.  "Stateless"
> merely
> >> moves the caching decisions into code you don't control.  I dealt with
> this
> >> issue first-hand while helping a popular micro-blogging site migrate
> from a
> >> "stateless" to a Scala-based backend.  I'm dealing with this issue
> >> first-hand helping another popular site that's experiencing exponential
> >> growth migrate away from "push everything back to the RDBMS and hope for
> >> the
> >> best."
> >>
> >> My original design for ESME is stateful.  My original design for ESME is
> >> based on lessoned learned in this very space and was oriented to have
> >> things
> >> intelligently cached so that the caching is not based on RDBMS indexes.
> >>  I'm
> >> not sure what happened to cause the particular issues, but it seems like
> >> folks are loading messages from the RDBMS rather than asking the message
> >> cache for them.
> >>
> >>
> >> >
> >> > Time for me to look at that code ...
> >> >
> >> > -Michael
> >> >
> >> >
> >> > ----- Original Message -----
> >> > From: Markus Kohler <[email protected]>
> >> > To: [email protected] <[email protected]>
> >> > Sent: Wed Nov 25 12:14:58 2009
> >> > Subject: Further analysis of the GC issue
> >> >
> >> > Hi all,
> >> > the Garbage Collector issue I was talking about is reproducible.
> >> > I've uploaded an annotated GC graph to
> >> >
> >> >
> >>
> http://picasaweb.google.com/lh/photo/wB-RRtb0wIVfpxJkTJPNuw?authkey=Gv1sRgCOve7LThpfvXsQE&feat=directlink
> >> >
> >> > I think the "LOGON" phase where I logon all the 300 users looks ok
> (given
> >> > that probably textile formatting is involved) but the phase where just
> >> one
> >> > user sends one message is certainly not looking good.
> >> >
> >> > I took the profiler and the result is a bit shocking. For that one
> >> message,
> >> > 881.000.000 objects weighting  23,2 Gbyte where allocated (and
> reclaimed
> >> > afterwards). My former record was 2Gbyte ;-)
> >> >
> >> > Fortunately I have a theory what happens, without looking into the
> >> > code,yet,
> >> > so take it with a grain of salt. It seems that the public time line
> for
> >> all
> >> > users is re-rendered, because 99% of the allocations come
> >> > from org.apache.esme.comet.PublicTimeline.render(). I guess all the
> >> actors
> >> > for all the users are sitting there, not knowing that the user has
> closed
> >> > the browser, because the user session has not yet expired.
> >> >
> >> > I wonder how we get around this with a real "push" model. If the
> browser
> >> > would ask for updates this rendering could be done lazily. Or can we
> >> "ping"
> >> > the browser and check whether it responds?
> >> > On the other side. It should also not be necessary the re-render the
> >> > message
> >> > again and again because the result will be the same.
> >> >
> >> > I will send Richard some attachments. Not sure whether you will need
> >> them,
> >> > they look very similar to the ones we already have.
> >> >
> >> > BTW, we should definitely check the use
> >> > of scala.xml.XML$.loadString(java.lang.String)
> >> > It's creating a new Parser each time, which is a bit costly because it
> >> > allocates a new Buffer each time and also hits the disk, when
> searching
> >> for
> >> > the name of the Java class.
> >> >
> >> > Greetings,
> >> > Markus
> >> >
> >> >
> >> >
> >> > "The best way to predict the future is to invent it" -- Alan Kay
> >> >
> >>
> >>
> >>
> >> --
> >> Lift, the simply functional web framework http://liftweb.net
> >> Beginning Scala http://www.apress.com/book/view/1430219890
> >> Follow me: http://twitter.com/dpp
> >> Surf the harmonics
> >>
> >
>

Re: Further analysis of the GC issue

Reply via email to