Wiki page is back again
On Sat, Nov 28, 2009 at 10:19 AM, Richard Hirsch <[email protected]> wrote: > For some reason the wiki page about the performance test on 11-25 was > lost, I'll have to create once again..... > > On Fri, Nov 27, 2009 at 5:47 AM, Richard Hirsch <[email protected]> wrote: >> Moved this whole thread to the wiki: >> http://cwiki.apache.org/confluence/display/ESME/Performance+test+2009-11-25 >> >> D. >> >> On Thu, Nov 26, 2009 at 2:22 PM, Markus Kohler <[email protected]> >> wrote: >>> Hi Michael, >>> No problem :-) >>> >>> >>> >>> Regards, >>> Markus >>> >>> "The best way to predict the future is to invent it" -- Alan Kay >>> >>> >>> On Thu, Nov 26, 2009 at 2:12 PM, Bechauf, Michael >>> <[email protected]>wrote: >>> >>>> Thanks Markus. That certainly sounds much better. I was confused >>>> yesterday already because 23 GByte memory would be a little difficult to >>>> create when not even the operating system can handle such size. I should >>>> have asked right away. Blame it on jetlag. >>>> >>>> -Michael >>>> >>>> -----Original Message----- >>>> From: Markus Kohler [mailto:[email protected]] >>>> Sent: Thursday, Nov 26, 2009 1:04 AM >>>> To: [email protected] >>>> Subject: Re: Further analysis of the GC issue >>>> >>>> Hi Michael, >>>> Good to see you here! >>>> >>>> "Memory Analyzer"? that's me ;-) >>>> >>>> The 23 Gbyte are not "retained" at one point in time, but they are the >>>> sum >>>> of all temporary allocated objects, most of memory, (or all of it, there >>>> doesn't seem to be an obvious memory leak), are gone within a >>>> millisecond. >>>> I'm confident that this value can be decreased to 90Mbyte and can be >>>> further >>>> improved down to a few MByte (or even less). We already know that the >>>> 90Mbyte are mostly caused be an inefficient textile parser. >>>> >>>> I also used the Memory Analyzer to look at how much memory is retained, >>>> e.g. >>>> still in use/referenced after the user interaction has been finished. >>>> The >>>> report is here >>>> http://cwiki.apache.org/confluence/display/ESME/Performance+test+-+2009- >>>> 11-22 >>>> Also there's room for improvement, potentially caused by the same bug >>>> that >>>> turned 90Mbyte into 23Gbyte, I don't see any major issues yet with >>>> regards >>>> to memory usage. >>>> >>>> This is also related to the state less versus state full discussion, ATM >>>> the >>>> amount of state needed for one user is already very low ( a few hundred >>>> kByte), at least compared to what I'm used to with Enterprise >>>> Applications. >>>> It is at least an order of magnitude lower, which can only partially >>>> explained by ESME being less complex than the typical Enterprise app. >>>> So far I don't see any major road block from the design perspective that >>>> would stop us from scaling very well. >>>> >>>> In my experience, it's quite normal that as soon as someone with a >>>> little >>>> bit of experience in performance takes as closer look at a software, >>>> that a >>>> few dramatic improvements can be made. That makes working as a >>>> performance >>>> analysis expert so gratifying. You suggest a few improvements, which >>>> have an >>>> dramatic impact, and then you walk away before it gets too complicated >>>> ;-) >>>> No, that's not my intention here :-) >>>> >>>> >>>> Markus >>>> >>>> "The best way to predict the future is to invent it" -- Alan Kay >>>> >>>> >>>> On Thu, Nov 26, 2009 at 6:04 AM, Bechauf, Michael >>>> <[email protected]>wrote: >>>> >>>> > David, >>>> > >>>> > well, "dead wrong" is a strong expression; hopefully I'm still >>>> breathing. I >>>> > don't want to judge without having looked at the code myself, but I >>>> have no >>>> > idea how a massive multi-user system could possibly be designed with >>>> state >>>> > where per-user information is kept in memory for a certain time. I >>>> mean, 23 >>>> > GB allocated - that's tough for an SAP transaction server that is not >>>> > mutlithreaded and where the memory management is highly optimized >>>> based on >>>> > shared memory that the work processes can attach to, or rolled out to >>>> a file >>>> > if unused for a whilet. It is, however, deadly for a VM that was never >>>> > designed for such memory consumption and where a GC run can halt the >>>> server. >>>> > >>>> > Anyway, I'll study this a bit more, particularely the Scala >>>> architecture. I >>>> > heard many good things about Scala, but in the end it's all translated >>>> to >>>> > things a VM can understand, and I hope Scala does a good enough job >>>> managing >>>> > this load in a transparent way. >>>> > >>>> > -Michael >>>> > >>>> > >>>> > ----- Original Message ----- >>>> > From: David Pollak <[email protected]> >>>> > To: [email protected] <[email protected]> >>>> > Sent: Wed Nov 25 23:00:20 2009 >>>> > Subject: Re: Further analysis of the GC issue >>>> > >>>> > On Wed, Nov 25, 2009 at 7:16 PM, Bechauf, Michael >>>> > <[email protected]>wrote: >>>> > >>>> > > Wasn't this exactly the kind of stuff that the Eclipse Memory >>>> Analyzer - >>>> > > donated by SAP - was supposed to fix ? A heap of that size for a >>>> still >>>> > > moderate number of 300 users is crazy, so either there is stuff like >>>> > > circular references that hog memory, or the design model is >>>> fundamentally >>>> > > flawed. I don't understand why ESME needs "sessions" ? How can a >>>> > scaleable >>>> > > server be created if each user will allocate memory until some >>>> timeout. >>>> > In a >>>> > > world of stateless browser-based UIs that's not going to work. >>>> > > >>>> > >>>> > You're actually dead wrong about this. "Stateless" is not... it's >>>> just >>>> > pushing state and cache someplace else (the RDBMS, memcached, etc.). >>>> > "Stateless" will lead to radical performance problems. "Stateless" >>>> merely >>>> > moves the caching decisions into code you don't control. I dealt with >>>> this >>>> > issue first-hand while helping a popular micro-blogging site migrate >>>> from a >>>> > "stateless" to a Scala-based backend. I'm dealing with this issue >>>> > first-hand helping another popular site that's experiencing >>>> exponential >>>> > growth migrate away from "push everything back to the RDBMS and hope >>>> for >>>> > the >>>> > best." >>>> > >>>> > My original design for ESME is stateful. My original design for ESME >>>> is >>>> > based on lessoned learned in this very space and was oriented to have >>>> > things >>>> > intelligently cached so that the caching is not based on RDBMS >>>> indexes. >>>> > I'm >>>> > not sure what happened to cause the particular issues, but it seems >>>> like >>>> > folks are loading messages from the RDBMS rather than asking the >>>> message >>>> > cache for them. >>>> > >>>> > >>>> > > >>>> > > Time for me to look at that code ... >>>> > > >>>> > > -Michael >>>> > > >>>> > > >>>> > > ----- Original Message ----- >>>> > > From: Markus Kohler <[email protected]> >>>> > > To: [email protected] <[email protected]> >>>> > > Sent: Wed Nov 25 12:14:58 2009 >>>> > > Subject: Further analysis of the GC issue >>>> > > >>>> > > Hi all, >>>> > > the Garbage Collector issue I was talking about is reproducible. >>>> > > I've uploaded an annotated GC graph to >>>> > > >>>> > > >>>> > >>>> http://picasaweb.google.com/lh/photo/wB-RRtb0wIVfpxJkTJPNuw?authkey=Gv1s >>>> RgCOve7LThpfvXsQE&feat=directlink >>>> > > >>>> > > I think the "LOGON" phase where I logon all the 300 users looks ok >>>> (given >>>> > > that probably textile formatting is involved) but the phase where >>>> just >>>> > one >>>> > > user sends one message is certainly not looking good. >>>> > > >>>> > > I took the profiler and the result is a bit shocking. For that one >>>> > message, >>>> > > 881.000.000 objects weighting 23,2 Gbyte where allocated (and >>>> reclaimed >>>> > > afterwards). My former record was 2Gbyte ;-) >>>> > > >>>> > > Fortunately I have a theory what happens, without looking into the >>>> > > code,yet, >>>> > > so take it with a grain of salt. It seems that the public time line >>>> for >>>> > all >>>> > > users is re-rendered, because 99% of the allocations come >>>> > > from org.apache.esme.comet.PublicTimeline.render(). I guess all the >>>> > actors >>>> > > for all the users are sitting there, not knowing that the user has >>>> closed >>>> > > the browser, because the user session has not yet expired. >>>> > > >>>> > > I wonder how we get around this with a real "push" model. If the >>>> browser >>>> > > would ask for updates this rendering could be done lazily. Or can we >>>> > "ping" >>>> > > the browser and check whether it responds? >>>> > > On the other side. It should also not be necessary the re-render the >>>> > > message >>>> > > again and again because the result will be the same. >>>> > > >>>> > > I will send Richard some attachments. Not sure whether you will need >>>> > them, >>>> > > they look very similar to the ones we already have. >>>> > > >>>> > > BTW, we should definitely check the use >>>> > > of scala.xml.XML$.loadString(java.lang.String) >>>> > > It's creating a new Parser each time, which is a bit costly because >>>> it >>>> > > allocates a new Buffer each time and also hits the disk, when >>>> searching >>>> > for >>>> > > the name of the Java class. >>>> > > >>>> > > Greetings, >>>> > > Markus >>>> > > >>>> > > >>>> > > >>>> > > "The best way to predict the future is to invent it" -- Alan Kay >>>> > > >>>> > >>>> > >>>> > >>>> > -- >>>> > Lift, the simply functional web framework http://liftweb.net >>>> > Beginning Scala http://www.apress.com/book/view/1430219890 >>>> > Follow me: http://twitter.com/dpp >>>> > Surf the harmonics >>>> > >>>> >>> >> >
