Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors

Chris Wilper Tue, 26 Aug 2008 12:14:44 -0700

Hi Kai,

I've been making changes on dev-150445, viewable at
http://fedora-commons.svn.sourceforge.net/viewvc/fedora-commons/fedora/branches/dev-150445/


If you'd like to check it out and see how things are improving, you
can get it via:

svn co 
https://fedora-commons.svn.sourceforge.net/svnroot/fedora-commons/fedora/branches/dev-150445.

So far the thing I did that had the most impact was to modify the
serializer to avoid the use of StringBuffers altogether and stream
things directly using a PrintWriter.

In my test, removing whitespace around foxml and audittrail elements
had some effect as well.  It's not really the repository's job to be
pretty-printing, so this is fine to do.  But I did keep the linefeeds
in place for basic readability purposes.

I also noticed in the RELS-EXT versions of the demo object that you
sent, your xml namespace declarations are currently occuring in each
element under rdf:Description. In your case, a significant chunk of
memory could be saved by putting these declarations in once at the
root RDF element instead.

So far I've been taking measurements on a running server, which has
given me a basic idea of how my changes are affecting heap use.  I
plan to look at the serializers/deserializers in isolation to
understand what they're using on their own.

- Chris


On Thu, Aug 21, 2008 at 3:11 AM, Strnad, Kai
<[EMAIL PROTECTED]> wrote:
> Chris,
>
> Thanks for the suggestion. We rely on versioning, so disabling it is not
> an option for us.
>
> Please let me know if i can help you implement refactorings or bounce some
> ideas off for the new branch...
>
> - Kai
>
>
> -----Ursprüngliche Nachricht-----
> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris Wilper
> Gesendet: Dienstag, 19. August 2008 17:28
> An: Strnad, Kai
> Cc: Razum, Matthias; Daniel Davis; 
> [email protected]
> Betreff: Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors
>
> Kai,
>
> Thanks for the sample object and detailed analysis.  I agree that not
> keeping the entire DigitalObject in memory would help a whole lot
> here, but would involve some pretty significant changes to code.
> Without doing that, I still think we can shave off significant amounts
> of required heap memory when doing modifications of objects.  I've
> started a branch to begin working on this.
>
> Just curious, is VERSIONABLE="false" an option for you with RELS-EXT?
> If so, that would be a big improvement in your case because otherwise,
> all the old versions of RELS-EXT are read into memory during any
> request involving the object.
>
> Thanks,
> Chris
>
> On Tue, Aug 19, 2008 at 10:37 AM, Strnad, Kai
> <[EMAIL PROTECTED]> wrote:
>> Hi all,
>>
>> thanks a lot for your ideas and suggestions. Separating the audit trail from 
>> the digital object is certainly helpful in reducing the overall size of the 
>> DO. There are cases though where the audit trail is only a minor part of the 
>> object and therefore removing it may not have the desired impact.
>>
>> I've attached an example object from our test suite we are able to 
>> consistently reproduce the OutOfMemorError with. The object is not unusual 
>> in terms of size or amount of datastreams, so it should be a realistic 
>> sample (there are of course much bigger objects...). Attached you will also 
>> find a screenshot of the heap dump at the time the error occurred. The dump 
>> was analyzed with the Eclipse Memory Analyzer. In order to illustrate the 
>> problem and quickly provoke an error i set the heap size to 64m.
>>
>> Also, i've attached the stack trace from the fedora.log. The trace shows 
>> that the OutOfMemoryError occurrs at DOTranslationUtility.writeToStream:808. 
>> The thread doesn't stop there however because the error gets caught by a 
>> catch(Throwable t) clause which catches exceptions as well as errors and 
>> then proceeds with the normal execution.
>>
>> As already stated in my previous mail, looking at the heap dump screenshot i 
>> see the following problems:
>> * The StringBuffer the digital object is kept in is 4 times the size of the 
>> digital object in the worst case.
>> * Indentation takes up lots of space, it would be helpful to make the 
>> serializer (and consequently the deserializer) customizable.
>> * Keeping several copies of the entire digital object in memory when it is 
>> not needed puts additional strain on the heap.
>>
>> I think there are two possible quick fixes.
>> * Increase heap space accordingly so the peak never gets critical. This is 
>> however problematic due to the huge objects we are sometimes dealing with. 
>> Unless we use a very large heap this only delays the error.
>> * Trim the StringBuffer before writing the digital object to the stream 
>> and/or preallocate capacities on initialization. This would significantly 
>> reduce the size of the digital object in memory, will however not solve the 
>> underlying issue - but again delay it.
>>
>> In order to permanently solve the issue it should be avoided having the 
>> whole digital object around when it is not needed. Being able to control the 
>> XML indentation would also help.
>>
>> - Kai
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Razum, Matthias
>> Gesendet: Mittwoch, 13. August 2008 17:50
>> An: 'Daniel Davis'
>> Cc: Chris Wilper; [email protected]; Strnad, 
>> Kai
>> Betreff: RE: [Fedora-commons-developers] Fedora OutOfMemoryErrors
>>
>> Dan,
>>
>> Don't get me wrong. I'm happy to see so many people looking into the issue, 
>> and any idea is worthwhile discussing :-)
>>
>> We'll provide you with an example FOXML asap. Meanwhile, we will do some 
>> further profiling on our side as well. Kai has an idea for a very simple 
>> fix, but he needs to proof that the fix works in general and not just for 
>> his test case.
>>
>> Matthias.
>>
>>> -----Original Message-----
>>> From: Daniel Davis [mailto:[EMAIL PROTECTED]
>>> Sent: Wednesday, August 13, 2008 5:34 PM
>>> To: Razum, Matthias
>>> Cc: Chris Wilper;
>>> [email protected]; Strnad, Kai
>>> Subject: Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors
>>>
>>> We want a record of events close to the digital object too!
>>> There may be reasons to ALSO write events to the log but for
>>> right now Chris is just trying to find out what is
>>> happening---before suggesting how to fix it.  It would be
>>> helpful for you to send us an example FOXML object that
>>> provokes the problem.
>>>
>>> -- Dan
>>>
>>> Razum, Matthias wrote:
>>>
>>>       Dan and Chris,
>>>
>>>       My two cents: My first reaction on Chris' proposal to
>>> separate the audit
>>>       trail from the DO was disbelief. I always thought that
>>> one of the
>>>       striking features of Fedora and FOXML is keeping stuff
>>> that belongs
>>>       together in one XML structure that can be validated any
>>> time. When asked
>>>       why someone should use Fedora, this is one of my top
>>> arguments. NARA-RLG
>>>       has far more expertise and experience than I have, so I
>>> should probably
>>>       dump my arguments and think of some new ones.
>>>
>>>       Still, I would be concerned about long-term
>>> preservation of my DO's. If
>>>       I start splitting it up my DO (well, Fedora does that
>>> already with
>>>       managed content, so it's not introducing anything new),
>>> preservation
>>>       becomes even more challenging. With my very little
>>> knowledge about
>>>       PREMIS and the idea to track all changes to an object
>>> as events, isn't
>>>       that exactly what the audit trail is good for? So would
>>> I want to keep
>>>       it as an integral part of my object?
>>>
>>>       Actually, for eSciDoc we can perfectly live without the
>>> audit trail, as
>>>       we write our own PREMIS-based event datastream for
>>> graphs of objects, so
>>>       both changes combined would probably boost the number
>>> of versions before
>>>       we run into out-of-memory errors.
>>>
>>>       Matthias.
>>>
>>>
>>>
>>>
>>>               -----Original Message-----
>>>               From: Daniel Davis [mailto:[EMAIL PROTECTED]
>>>               Sent: Tuesday, August 12, 2008 5:21 PM
>>>               To: Chris Wilper
>>>               Cc: Razum, Matthias;
>>>
>>> [email protected]; Strnad, Kai
>>>               Subject: Re: [Fedora-commons-developers] Fedora
>>> OutOfMemoryErrors
>>>
>>>               The NARA-RLG report thinks that the "audit"
>>> should be kept
>>>               separate from the "object" anyway because of
>>> the potential of
>>>               tampering.  With correlation information kept
>>> in the log,
>>>               this information could be kept in
>>> server/logs/audit.log which
>>>               would be periodically snipped off and stored as
>>> a non-inlined
>>>               Datastream in a sequence of repository
>>> generated objects that
>>>               record change history.
>>>
>>>               This would make it harder in the future to make digital
>>>               object change operations idempotent because it
>>> is convenient
>>>               to have that information localized to the
>>> digital object in
>>>               question.  Moving large audit trails to non-inlined
>>>               Datastreams which are still encapsulated by the digital
>>>               object would permit separate, though less
>>> convenient, processing.
>>>
>>>               I am curious because the XML for fifty items
>>> should not be
>>>               large enough for a reasonable memory model
>>> unless the traffic
>>>               is very heavy.  I have not looked at that code
>>> and I wonder
>>>               if we can move to a delayed object creation
>>> scheme to reduce
>>>               the size of the business objects representing
>>> the digital
>>>               object in working memory.  I know we are
>>> looking for a quick
>>>               fix not a refactoring but I am still curious.
>>>
>>>               -- Dan
>>>
>>>               Chris Wilper wrote:
>>>
>>>                       Kai and Matthias,
>>>
>>>                       Just wanted to let you know I've been
>>> doing some
>>>               profiling on this
>>>                       over here.  I suspect saving the audit
>>> records external
>>>               to the FOXML
>>>                       would help a LOT with this.  One idea
>>> is to avoid the
>>>               special "AUDIT"
>>>                       datastream altogether and save them in
>>>               server/logs/audit.log instead.
>>>                       Later refactorings could address the
>>> issue of having to read the
>>>                       entire DigitalObject to make a change
>>> to one piece, but I think
>>>                       dealing with the ever-growing "AUDIT"
>>> datastream would
>>>               be a simple way
>>>                       to stop the bleeding.  Thoughts on this
>>> approach?
>>>
>>>                       - Chris
>>>
>>>                       2008/7/29 Razum, Matthias
>>>               <[EMAIL PROTECTED]>
>>> <mailto:[EMAIL PROTECTED]>
>>>               <mailto:[EMAIL PROTECTED]>
>>> <mailto:[EMAIL PROTECTED]>  :
>>>
>>>
>>>                               Hi all,
>>>
>>>                               This is a pretty severe bug for
>>> us. We run into
>>>               the issue when we try to
>>>                               create a new version of an
>>> object with ~50
>>>               previous versions. This is a
>>>                               not-so-rare condition if we
>>> want to add members
>>>               to a collection, thus
>>>                               creating versions of the
>>> collection object.
>>>
>>>                               I haven't seen any feedback for
>>> this bug report
>>>               on the list from the
>>>                               Fedora dev team, and I can't find it in
>>>               Fedora's bugtracker on
>>>                               sourceforge.net. Any reaction
>>> from the Fedora
>>>               team would be highly
>>>                               appreciated, even though I am
>>> aware of the
>>>               pressure from the upcoming
>>>                               Fedora 3.0 release.
>>>
>>>                               Cheers,
>>>                               Matthias.
>>>
>>>
>>>
>>>
>>>
>>>                                       -----Original Message-----
>>>                                       From:
>>>               [EMAIL PROTECTED]
>>>
>>>
>>> [mailto:[EMAIL PROTECTED]
>>>
>>>
>>>                               t] On Behalf Of Strnad, Kai
>>>
>>>
>>>                                       Sent: Monday, July 14,
>>> 2008 11:47 AM
>>>                                       To:
>>>               [email protected]
>>>                                       Subject:
>>> [Fedora-commons-developers]
>>>               Fedora OutOfMemoryErrors
>>>
>>>                                       Hi all,
>>>
>>>                                       we frequently encounter
>>>               OutOfMemoryErrors when calling
>>>                                       modifyDatastreamByValue
>>> and other API-M
>>>               methods on relatively large
>>>                                       digital objects using
>>> Fedora Commons
>>>               3.0b1 and 3.0b2. In
>>>                                       order to better
>>>                                       understand the issue we
>>> triggered heap
>>>               dumps and analyzed them. The
>>>                                       dumps revealed that up
>>> to 140M of heap
>>>               space get used by Fedora when
>>>                                       calling
>>> modifyDatastreamByValue on a
>>>               digital object of 15M.
>>>
>>>                                       In order to provoke
>>> heap dumps at each
>>>               api call the heap size was
>>>                                       reduced. Additionally
>>> we triggered heap
>>>               dumps at specific locations
>>>                                       programmatically using
>>> the Java6
>>>               HotSpotDiagnosticMXBean.
>>>
>>>                                       The OutOfMemoryError
>>> always occurs at
>>>
>>> DOTranslationUtility.writeToStream()
>>>               after the serialization. This
>>>                                       appears to be the peak
>>> of heap usage
>>>               for modifyDatastreamByValue.
>>>                                       The heap dump shows the
>>> following
>>>               composition of objects at
>>>                                       the time of
>>>                                       writeToStream() (see
>>> attached screenshot):
>>>                                        * StringBuffer (60M) (15M * 2
>>>               (internal UTF-16 representation)) + 30M
>>>                                       memory allocated by
>>> StringBuffer
>>>               (StringBuffer doubles its capacity
>>>                                       automatically when unsufficient
>>>               capacity is left for appending a new
>>>                                       String. Hence the
>>> capacity is likely to
>>>               exceed the actual
>>>                                       memory needed
>>>                                       unless explicitly allocated).
>>>                                        * char[] array at
>>> writeToStream
>>>               (StringBuffer.toString())
>>>                                       (31M) (15M *
>>>                                       2 + overhead)
>>>                                        * BasicDigitalObject 24M (15M
>>>               DatastreamXMLMetadata, 9M AuditRecord)
>>>                                        * DOReaderCache 25M (1
>>>               BasicDigitalObject in cache at the time)
>>>                                        * Some other small objects
>>>
>>>                                       If the heap space is
>>> already consumed
>>>               to a large extent, allocating
>>>                                       another chunk of memory
>>> may fail and
>>>               subsequently trigger an
>>>                                       OutOfMemoryError.
>>> Explicitly calling
>>>               the garbage collector is not a
>>>                                       viable option, because
>>> most of the
>>>               objects involved are still bound
>>>                                       locally to the thread,
>>> so they are
>>>               still reachable.
>>>
>>>                                       Increasing the heap
>>> will solve the
>>>               issue temporarily. Depending on the
>>>                                       size of the digital
>>> object the problem
>>>               may however resurface: Suppose
>>>                                       the digital object is 30M, then
>>>               according to our findings a heap space
>>>                                       of 60M*2 StringBuffer +
>>> 60M char array
>>>               + ~50M DO + ~50M cache = 280M
>>>                                       would be needed for a
>>> single digital
>>>               object (we haven't tried this
>>>                                       however).
>>>
>>>                                       We modified the Fedora
>>> code and tried
>>>               the following options:
>>>                                       * We removed the
>>> indentation in the
>>>               FOXMLDOSerializer and
>>>                                       DOTranslationUtility.
>>> Removing most of
>>>               the nonessential
>>>                                       whitespaces (or
>>>                                       replacing indentation
>>> whitespaces with
>>>               tabs) results in a much smaller
>>>                                       DO size (about 20% in
>>> our test case)
>>>               and therefore reduces memory
>>>                                       footprint.
>>>
>>>                                       * As for the
>>> StringBuffer problem we
>>>               basically tried two
>>>                                       approaches. We
>>>                                       trimmed the StringBuffer in
>>>               FOXMLDOSerializer before the call to
>>>                                       writeToStream() using
>>> the trimToSize()
>>>               method. This adjusts
>>>                                       the capacity
>>>                                       of the StringBuffer to
>>> the actual size
>>>               of characters contained within.
>>>                                       Another option is to
>>> explicitly size the buffer.
>>>
>>>                                       * The 64 bit version of
>>> Java consumes
>>>               considerably more heap space
>>>                                       compared to the 32 bit
>>> version. Using a
>>>               32 bit version reduces memory
>>>                                       usage.
>>>
>>>                                       All options mentioned
>>> above work well
>>>               and reduce memory consumption
>>>                                       significantly, but
>>> solve the underlying
>>>               problem only partially.
>>>
>>>                                       Perhaps a better
>>> solution would be to
>>>               load and process only
>>>                                       those parts
>>>                                       of the digital object
>>> needed for the
>>>               current operation (not viable for
>>>                                       ingest, but e.g.
>>> modifyDatastreamByX),
>>>               but that would probably involve
>>>                                       lots of refactoring...
>>>
>>>                                       Has anyone had to deal
>>> with this issue
>>>               previously ? Any insights or
>>>                                       suggestions would be great.
>>>
>>>
>>>                                       Thank you very much,
>>>                                       Kai
>>>
>>>
>>> ________________________________
>>>
>>>
>>>
>>>
>>>               -------------------------------------------------------
>>>
>>>               Fachinformationszentrum Karlsruhe, Gesellschaft
>>> für wissenschaftlich-technische Information mbH.
>>>               Sitz der Gesellschaft:
>>> Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
>>>               Geschäftsführerin: Sabine Brünger-Weilandt.
>>>               Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
>>>
>>>
>>>
>>>
>>> --
>>> Daniel W. Davis
>>> Chief Software Architect, Fedora Commons
>>> Researcher, Cornell Information Science
>>> http://www.fedora-commons.org
>>> [EMAIL PROTECTED]
>>> [EMAIL PROTECTED]
>>> (607) 255-6090 (Office)
>>>
>>>
>>
>>
>> -------------------------------------------------------
>>
>> Fachinformationszentrum Karlsruhe, Gesellschaft für 
>> wissenschaftlich-technische Information mbH.
>> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 
>> 101892.
>> Geschäftsführerin: Sabine Brünger-Weilandt.
>> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
>>
>>
>
>
> -------------------------------------------------------
>
> Fachinformationszentrum Karlsruhe, Gesellschaft für 
> wissenschaftlich-technische Information mbH.
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 
> 101892.
> Geschäftsführerin: Sabine Brünger-Weilandt.
> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
>
>
>
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors

Reply via email to