Hi Kai, I've been making changes on dev-150445, viewable at http://fedora-commons.svn.sourceforge.net/viewvc/fedora-commons/fedora/branches/dev-150445/
If you'd like to check it out and see how things are improving, you can get it via: svn co https://fedora-commons.svn.sourceforge.net/svnroot/fedora-commons/fedora/branches/dev-150445. So far the thing I did that had the most impact was to modify the serializer to avoid the use of StringBuffers altogether and stream things directly using a PrintWriter. In my test, removing whitespace around foxml and audittrail elements had some effect as well. It's not really the repository's job to be pretty-printing, so this is fine to do. But I did keep the linefeeds in place for basic readability purposes. I also noticed in the RELS-EXT versions of the demo object that you sent, your xml namespace declarations are currently occuring in each element under rdf:Description. In your case, a significant chunk of memory could be saved by putting these declarations in once at the root RDF element instead. So far I've been taking measurements on a running server, which has given me a basic idea of how my changes are affecting heap use. I plan to look at the serializers/deserializers in isolation to understand what they're using on their own. - Chris On Thu, Aug 21, 2008 at 3:11 AM, Strnad, Kai <[EMAIL PROTECTED]> wrote: > Chris, > > Thanks for the suggestion. We rely on versioning, so disabling it is not > an option for us. > > Please let me know if i can help you implement refactorings or bounce some > ideas off for the new branch... > > - Kai > > > -----Ursprüngliche Nachricht----- > Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris Wilper > Gesendet: Dienstag, 19. August 2008 17:28 > An: Strnad, Kai > Cc: Razum, Matthias; Daniel Davis; > [email protected] > Betreff: Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors > > Kai, > > Thanks for the sample object and detailed analysis. I agree that not > keeping the entire DigitalObject in memory would help a whole lot > here, but would involve some pretty significant changes to code. > Without doing that, I still think we can shave off significant amounts > of required heap memory when doing modifications of objects. I've > started a branch to begin working on this. > > Just curious, is VERSIONABLE="false" an option for you with RELS-EXT? > If so, that would be a big improvement in your case because otherwise, > all the old versions of RELS-EXT are read into memory during any > request involving the object. > > Thanks, > Chris > > On Tue, Aug 19, 2008 at 10:37 AM, Strnad, Kai > <[EMAIL PROTECTED]> wrote: >> Hi all, >> >> thanks a lot for your ideas and suggestions. Separating the audit trail from >> the digital object is certainly helpful in reducing the overall size of the >> DO. There are cases though where the audit trail is only a minor part of the >> object and therefore removing it may not have the desired impact. >> >> I've attached an example object from our test suite we are able to >> consistently reproduce the OutOfMemorError with. The object is not unusual >> in terms of size or amount of datastreams, so it should be a realistic >> sample (there are of course much bigger objects...). Attached you will also >> find a screenshot of the heap dump at the time the error occurred. The dump >> was analyzed with the Eclipse Memory Analyzer. In order to illustrate the >> problem and quickly provoke an error i set the heap size to 64m. >> >> Also, i've attached the stack trace from the fedora.log. The trace shows >> that the OutOfMemoryError occurrs at DOTranslationUtility.writeToStream:808. >> The thread doesn't stop there however because the error gets caught by a >> catch(Throwable t) clause which catches exceptions as well as errors and >> then proceeds with the normal execution. >> >> As already stated in my previous mail, looking at the heap dump screenshot i >> see the following problems: >> * The StringBuffer the digital object is kept in is 4 times the size of the >> digital object in the worst case. >> * Indentation takes up lots of space, it would be helpful to make the >> serializer (and consequently the deserializer) customizable. >> * Keeping several copies of the entire digital object in memory when it is >> not needed puts additional strain on the heap. >> >> I think there are two possible quick fixes. >> * Increase heap space accordingly so the peak never gets critical. This is >> however problematic due to the huge objects we are sometimes dealing with. >> Unless we use a very large heap this only delays the error. >> * Trim the StringBuffer before writing the digital object to the stream >> and/or preallocate capacities on initialization. This would significantly >> reduce the size of the digital object in memory, will however not solve the >> underlying issue - but again delay it. >> >> In order to permanently solve the issue it should be avoided having the >> whole digital object around when it is not needed. Being able to control the >> XML indentation would also help. >> >> - Kai >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: Razum, Matthias >> Gesendet: Mittwoch, 13. August 2008 17:50 >> An: 'Daniel Davis' >> Cc: Chris Wilper; [email protected]; Strnad, >> Kai >> Betreff: RE: [Fedora-commons-developers] Fedora OutOfMemoryErrors >> >> Dan, >> >> Don't get me wrong. I'm happy to see so many people looking into the issue, >> and any idea is worthwhile discussing :-) >> >> We'll provide you with an example FOXML asap. Meanwhile, we will do some >> further profiling on our side as well. Kai has an idea for a very simple >> fix, but he needs to proof that the fix works in general and not just for >> his test case. >> >> Matthias. >> >>> -----Original Message----- >>> From: Daniel Davis [mailto:[EMAIL PROTECTED] >>> Sent: Wednesday, August 13, 2008 5:34 PM >>> To: Razum, Matthias >>> Cc: Chris Wilper; >>> [email protected]; Strnad, Kai >>> Subject: Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors >>> >>> We want a record of events close to the digital object too! >>> There may be reasons to ALSO write events to the log but for >>> right now Chris is just trying to find out what is >>> happening---before suggesting how to fix it. It would be >>> helpful for you to send us an example FOXML object that >>> provokes the problem. >>> >>> -- Dan >>> >>> Razum, Matthias wrote: >>> >>> Dan and Chris, >>> >>> My two cents: My first reaction on Chris' proposal to >>> separate the audit >>> trail from the DO was disbelief. I always thought that >>> one of the >>> striking features of Fedora and FOXML is keeping stuff >>> that belongs >>> together in one XML structure that can be validated any >>> time. When asked >>> why someone should use Fedora, this is one of my top >>> arguments. NARA-RLG >>> has far more expertise and experience than I have, so I >>> should probably >>> dump my arguments and think of some new ones. >>> >>> Still, I would be concerned about long-term >>> preservation of my DO's. If >>> I start splitting it up my DO (well, Fedora does that >>> already with >>> managed content, so it's not introducing anything new), >>> preservation >>> becomes even more challenging. With my very little >>> knowledge about >>> PREMIS and the idea to track all changes to an object >>> as events, isn't >>> that exactly what the audit trail is good for? So would >>> I want to keep >>> it as an integral part of my object? >>> >>> Actually, for eSciDoc we can perfectly live without the >>> audit trail, as >>> we write our own PREMIS-based event datastream for >>> graphs of objects, so >>> both changes combined would probably boost the number >>> of versions before >>> we run into out-of-memory errors. >>> >>> Matthias. >>> >>> >>> >>> >>> -----Original Message----- >>> From: Daniel Davis [mailto:[EMAIL PROTECTED] >>> Sent: Tuesday, August 12, 2008 5:21 PM >>> To: Chris Wilper >>> Cc: Razum, Matthias; >>> >>> [email protected]; Strnad, Kai >>> Subject: Re: [Fedora-commons-developers] Fedora >>> OutOfMemoryErrors >>> >>> The NARA-RLG report thinks that the "audit" >>> should be kept >>> separate from the "object" anyway because of >>> the potential of >>> tampering. With correlation information kept >>> in the log, >>> this information could be kept in >>> server/logs/audit.log which >>> would be periodically snipped off and stored as >>> a non-inlined >>> Datastream in a sequence of repository >>> generated objects that >>> record change history. >>> >>> This would make it harder in the future to make digital >>> object change operations idempotent because it >>> is convenient >>> to have that information localized to the >>> digital object in >>> question. Moving large audit trails to non-inlined >>> Datastreams which are still encapsulated by the digital >>> object would permit separate, though less >>> convenient, processing. >>> >>> I am curious because the XML for fifty items >>> should not be >>> large enough for a reasonable memory model >>> unless the traffic >>> is very heavy. I have not looked at that code >>> and I wonder >>> if we can move to a delayed object creation >>> scheme to reduce >>> the size of the business objects representing >>> the digital >>> object in working memory. I know we are >>> looking for a quick >>> fix not a refactoring but I am still curious. >>> >>> -- Dan >>> >>> Chris Wilper wrote: >>> >>> Kai and Matthias, >>> >>> Just wanted to let you know I've been >>> doing some >>> profiling on this >>> over here. I suspect saving the audit >>> records external >>> to the FOXML >>> would help a LOT with this. One idea >>> is to avoid the >>> special "AUDIT" >>> datastream altogether and save them in >>> server/logs/audit.log instead. >>> Later refactorings could address the >>> issue of having to read the >>> entire DigitalObject to make a change >>> to one piece, but I think >>> dealing with the ever-growing "AUDIT" >>> datastream would >>> be a simple way >>> to stop the bleeding. Thoughts on this >>> approach? >>> >>> - Chris >>> >>> 2008/7/29 Razum, Matthias >>> <[EMAIL PROTECTED]> >>> <mailto:[EMAIL PROTECTED]> >>> <mailto:[EMAIL PROTECTED]> >>> <mailto:[EMAIL PROTECTED]> : >>> >>> >>> Hi all, >>> >>> This is a pretty severe bug for >>> us. We run into >>> the issue when we try to >>> create a new version of an >>> object with ~50 >>> previous versions. This is a >>> not-so-rare condition if we >>> want to add members >>> to a collection, thus >>> creating versions of the >>> collection object. >>> >>> I haven't seen any feedback for >>> this bug report >>> on the list from the >>> Fedora dev team, and I can't find it in >>> Fedora's bugtracker on >>> sourceforge.net. Any reaction >>> from the Fedora >>> team would be highly >>> appreciated, even though I am >>> aware of the >>> pressure from the upcoming >>> Fedora 3.0 release. >>> >>> Cheers, >>> Matthias. >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: >>> [EMAIL PROTECTED] >>> >>> >>> [mailto:[EMAIL PROTECTED] >>> >>> >>> t] On Behalf Of Strnad, Kai >>> >>> >>> Sent: Monday, July 14, >>> 2008 11:47 AM >>> To: >>> [email protected] >>> Subject: >>> [Fedora-commons-developers] >>> Fedora OutOfMemoryErrors >>> >>> Hi all, >>> >>> we frequently encounter >>> OutOfMemoryErrors when calling >>> modifyDatastreamByValue >>> and other API-M >>> methods on relatively large >>> digital objects using >>> Fedora Commons >>> 3.0b1 and 3.0b2. In >>> order to better >>> understand the issue we >>> triggered heap >>> dumps and analyzed them. The >>> dumps revealed that up >>> to 140M of heap >>> space get used by Fedora when >>> calling >>> modifyDatastreamByValue on a >>> digital object of 15M. >>> >>> In order to provoke >>> heap dumps at each >>> api call the heap size was >>> reduced. Additionally >>> we triggered heap >>> dumps at specific locations >>> programmatically using >>> the Java6 >>> HotSpotDiagnosticMXBean. >>> >>> The OutOfMemoryError >>> always occurs at >>> >>> DOTranslationUtility.writeToStream() >>> after the serialization. This >>> appears to be the peak >>> of heap usage >>> for modifyDatastreamByValue. >>> The heap dump shows the >>> following >>> composition of objects at >>> the time of >>> writeToStream() (see >>> attached screenshot): >>> * StringBuffer (60M) (15M * 2 >>> (internal UTF-16 representation)) + 30M >>> memory allocated by >>> StringBuffer >>> (StringBuffer doubles its capacity >>> automatically when unsufficient >>> capacity is left for appending a new >>> String. Hence the >>> capacity is likely to >>> exceed the actual >>> memory needed >>> unless explicitly allocated). >>> * char[] array at >>> writeToStream >>> (StringBuffer.toString()) >>> (31M) (15M * >>> 2 + overhead) >>> * BasicDigitalObject 24M (15M >>> DatastreamXMLMetadata, 9M AuditRecord) >>> * DOReaderCache 25M (1 >>> BasicDigitalObject in cache at the time) >>> * Some other small objects >>> >>> If the heap space is >>> already consumed >>> to a large extent, allocating >>> another chunk of memory >>> may fail and >>> subsequently trigger an >>> OutOfMemoryError. >>> Explicitly calling >>> the garbage collector is not a >>> viable option, because >>> most of the >>> objects involved are still bound >>> locally to the thread, >>> so they are >>> still reachable. >>> >>> Increasing the heap >>> will solve the >>> issue temporarily. Depending on the >>> size of the digital >>> object the problem >>> may however resurface: Suppose >>> the digital object is 30M, then >>> according to our findings a heap space >>> of 60M*2 StringBuffer + >>> 60M char array >>> + ~50M DO + ~50M cache = 280M >>> would be needed for a >>> single digital >>> object (we haven't tried this >>> however). >>> >>> We modified the Fedora >>> code and tried >>> the following options: >>> * We removed the >>> indentation in the >>> FOXMLDOSerializer and >>> DOTranslationUtility. >>> Removing most of >>> the nonessential >>> whitespaces (or >>> replacing indentation >>> whitespaces with >>> tabs) results in a much smaller >>> DO size (about 20% in >>> our test case) >>> and therefore reduces memory >>> footprint. >>> >>> * As for the >>> StringBuffer problem we >>> basically tried two >>> approaches. We >>> trimmed the StringBuffer in >>> FOXMLDOSerializer before the call to >>> writeToStream() using >>> the trimToSize() >>> method. This adjusts >>> the capacity >>> of the StringBuffer to >>> the actual size >>> of characters contained within. >>> Another option is to >>> explicitly size the buffer. >>> >>> * The 64 bit version of >>> Java consumes >>> considerably more heap space >>> compared to the 32 bit >>> version. Using a >>> 32 bit version reduces memory >>> usage. >>> >>> All options mentioned >>> above work well >>> and reduce memory consumption >>> significantly, but >>> solve the underlying >>> problem only partially. >>> >>> Perhaps a better >>> solution would be to >>> load and process only >>> those parts >>> of the digital object >>> needed for the >>> current operation (not viable for >>> ingest, but e.g. >>> modifyDatastreamByX), >>> but that would probably involve >>> lots of refactoring... >>> >>> Has anyone had to deal >>> with this issue >>> previously ? Any insights or >>> suggestions would be great. >>> >>> >>> Thank you very much, >>> Kai >>> >>> >>> ________________________________ >>> >>> >>> >>> >>> ------------------------------------------------------- >>> >>> Fachinformationszentrum Karlsruhe, Gesellschaft >>> für wissenschaftlich-technische Information mbH. >>> Sitz der Gesellschaft: >>> Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. >>> Geschäftsführerin: Sabine Brünger-Weilandt. >>> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl. >>> >>> >>> >>> >>> -- >>> Daniel W. Davis >>> Chief Software Architect, Fedora Commons >>> Researcher, Cornell Information Science >>> http://www.fedora-commons.org >>> [EMAIL PROTECTED] >>> [EMAIL PROTECTED] >>> (607) 255-6090 (Office) >>> >>> >> >> >> ------------------------------------------------------- >> >> Fachinformationszentrum Karlsruhe, Gesellschaft für >> wissenschaftlich-technische Information mbH. >> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB >> 101892. >> Geschäftsführerin: Sabine Brünger-Weilandt. >> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl. >> >> > > > ------------------------------------------------------- > > Fachinformationszentrum Karlsruhe, Gesellschaft für > wissenschaftlich-technische Information mbH. > Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB > 101892. > Geschäftsführerin: Sabine Brünger-Weilandt. > Vorsitzender des Aufsichtsrats: MinR Hermann Riehl. > > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Fedora-commons-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
