Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors

Strnad, Kai Thu, 28 Aug 2008 09:12:53 -0700

Chris,

i've done some testing here using your latest changes and the results look very 
good. My tests confirm that the changes you've made reduce memory usage 
significantly. With a heap properly dimensioned we shouldn't run into the OOM 
so easily now. 
We also consider the problem resolved for the time being. 
Are you planning on merging this branch into the trunk ?


Thanks a lot & great job.

- Kai  



-----Ursprüngliche Nachricht-----
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris Wilper
Gesendet: Mittwoch, 27. August 2008 20:24
An: Strnad, Kai
Cc: Razum, Matthias; Daniel Davis; 
[email protected]
Betreff: Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors

Hi again,

Just wanted to report that I've done some testing of the FOXML
serializer/deserializer in isolation and found the following when
using the escidoc_4971 xml file (~4.5MB) as input, on a 32-bit
machine:

deserializer:
  during deserialization: uses 21MB heap (before GC, less than 7MB reachable),
  after deserialization: uses about 6MB (after GC..when only
DigitalObject is reachable)

serializer-before:
  during serialization: uses 58MB (max 57MB reachable)
  after serialization: uses 0 (DigitalObject no longer reachable)

serializer-after:
  during deserialization: uses 18MB (less than 7MB reachable)
  after serialization: uses 0 (DigitalObject no longer reachable)

In order to test "max reachable" between point A and point B
in the code, without examining all points in between, I ended
up running the test with less and less total heap available,
until I saw an OutOfMemory error.  In my final test, I was
able to run deserialization and serialization back to back
with a total heap size of 7MB without seeing OOM.

I suspect these changes will have resolved the problem for the
time being.  Now the memory bottleneck is definitely in
deserialization...the bulk of that being the building up
of the DigitalObject itself.  Notice that when this DigitalObject
was instantiated, it took up well below 2x the size of the
XML itself.  Although java stores strings in UTF-16 internally,
all the inline xml datastreams are held as (UTF-8) byte
arrays until/unless their content is requested.

- Chris

On Tue, Aug 26, 2008 at 3:14 PM, Chris Wilper
<[EMAIL PROTECTED]> wrote:
> Hi Kai,
>
> I've been making changes on dev-150445, viewable at
> http://fedora-commons.svn.sourceforge.net/viewvc/fedora-commons/fedora/branches/dev-150445/
>
> If you'd like to check it out and see how things are improving, you
> can get it via:
>
> svn co 
> https://fedora-commons.svn.sourceforge.net/svnroot/fedora-commons/fedora/branches/dev-150445.
>
> So far the thing I did that had the most impact was to modify the
> serializer to avoid the use of StringBuffers altogether and stream
> things directly using a PrintWriter.
>
> In my test, removing whitespace around foxml and audittrail elements
> had some effect as well.  It's not really the repository's job to be
> pretty-printing, so this is fine to do.  But I did keep the linefeeds
> in place for basic readability purposes.
>
> I also noticed in the RELS-EXT versions of the demo object that you
> sent, your xml namespace declarations are currently occuring in each
> element under rdf:Description. In your case, a significant chunk of
> memory could be saved by putting these declarations in once at the
> root RDF element instead.
>
> So far I've been taking measurements on a running server, which has
> given me a basic idea of how my changes are affecting heap use.  I
> plan to look at the serializers/deserializers in isolation to
> understand what they're using on their own.
>
> - Chris
>
>
> On Thu, Aug 21, 2008 at 3:11 AM, Strnad, Kai
> <[EMAIL PROTECTED]> wrote:
>> Chris,
>>
>> Thanks for the suggestion. We rely on versioning, so disabling it is not
>> an option for us.
>>
>> Please let me know if i can help you implement refactorings or bounce some
>> ideas off for the new branch...
>>
>> - Kai
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Chris Wilper
>> Gesendet: Dienstag, 19. August 2008 17:28
>> An: Strnad, Kai
>> Cc: Razum, Matthias; Daniel Davis; 
>> [email protected]
>> Betreff: Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors
>>
>> Kai,
>>
>> Thanks for the sample object and detailed analysis.  I agree that not
>> keeping the entire DigitalObject in memory would help a whole lot
>> here, but would involve some pretty significant changes to code.
>> Without doing that, I still think we can shave off significant amounts
>> of required heap memory when doing modifications of objects.  I've
>> started a branch to begin working on this.
>>
>> Just curious, is VERSIONABLE="false" an option for you with RELS-EXT?
>> If so, that would be a big improvement in your case because otherwise,
>> all the old versions of RELS-EXT are read into memory during any
>> request involving the object.
>>
>> Thanks,
>> Chris
>>
>> On Tue, Aug 19, 2008 at 10:37 AM, Strnad, Kai
>> <[EMAIL PROTECTED]> wrote:
>>> Hi all,
>>>
>>> thanks a lot for your ideas and suggestions. Separating the audit trail 
>>> from the digital object is certainly helpful in reducing the overall size 
>>> of the DO. There are cases though where the audit trail is only a minor 
>>> part of the object and therefore removing it may not have the desired 
>>> impact.
>>>
>>> I've attached an example object from our test suite we are able to 
>>> consistently reproduce the OutOfMemorError with. The object is not unusual 
>>> in terms of size or amount of datastreams, so it should be a realistic 
>>> sample (there are of course much bigger objects...). Attached you will also 
>>> find a screenshot of the heap dump at the time the error occurred. The dump 
>>> was analyzed with the Eclipse Memory Analyzer. In order to illustrate the 
>>> problem and quickly provoke an error i set the heap size to 64m.
>>>
>>> Also, i've attached the stack trace from the fedora.log. The trace shows 
>>> that the OutOfMemoryError occurrs at 
>>> DOTranslationUtility.writeToStream:808. The thread doesn't stop there 
>>> however because the error gets caught by a catch(Throwable t) clause which 
>>> catches exceptions as well as errors and then proceeds with the normal 
>>> execution.
>>>
>>> As already stated in my previous mail, looking at the heap dump screenshot 
>>> i see the following problems:
>>> * The StringBuffer the digital object is kept in is 4 times the size of the 
>>> digital object in the worst case.
>>> * Indentation takes up lots of space, it would be helpful to make the 
>>> serializer (and consequently the deserializer) customizable.
>>> * Keeping several copies of the entire digital object in memory when it is 
>>> not needed puts additional strain on the heap.
>>>
>>> I think there are two possible quick fixes.
>>> * Increase heap space accordingly so the peak never gets critical. This is 
>>> however problematic due to the huge objects we are sometimes dealing with. 
>>> Unless we use a very large heap this only delays the error.
>>> * Trim the StringBuffer before writing the digital object to the stream 
>>> and/or preallocate capacities on initialization. This would significantly 
>>> reduce the size of the digital object in memory, will however not solve the 
>>> underlying issue - but again delay it.
>>>
>>> In order to permanently solve the issue it should be avoided having the 
>>> whole digital object around when it is not needed. Being able to control 
>>> the XML indentation would also help.
>>>
>>> - Kai
>>>
>>>
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Razum, Matthias
>>> Gesendet: Mittwoch, 13. August 2008 17:50
>>> An: 'Daniel Davis'
>>> Cc: Chris Wilper; [email protected]; Strnad, 
>>> Kai
>>> Betreff: RE: [Fedora-commons-developers] Fedora OutOfMemoryErrors
>>>
>>> Dan,
>>>
>>> Don't get me wrong. I'm happy to see so many people looking into the issue, 
>>> and any idea is worthwhile discussing :-)
>>>
>>> We'll provide you with an example FOXML asap. Meanwhile, we will do some 
>>> further profiling on our side as well. Kai has an idea for a very simple 
>>> fix, but he needs to proof that the fix works in general and not just for 
>>> his test case.
>>>
>>> Matthias.
>>>
>>>> -----Original Message-----
>>>> From: Daniel Davis [mailto:[EMAIL PROTECTED]
>>>> Sent: Wednesday, August 13, 2008 5:34 PM
>>>> To: Razum, Matthias
>>>> Cc: Chris Wilper;
>>>> [email protected]; Strnad, Kai
>>>> Subject: Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors
>>>>
>>>> We want a record of events close to the digital object too!
>>>> There may be reasons to ALSO write events to the log but for
>>>> right now Chris is just trying to find out what is
>>>> happening---before suggesting how to fix it.  It would be
>>>> helpful for you to send us an example FOXML object that
>>>> provokes the problem.
>>>>
>>>> -- Dan
>>>>
>>>> Razum, Matthias wrote:
>>>>
>>>>       Dan and Chris,
>>>>
>>>>       My two cents: My first reaction on Chris' proposal to
>>>> separate the audit
>>>>       trail from the DO was disbelief. I always thought that
>>>> one of the
>>>>       striking features of Fedora and FOXML is keeping stuff
>>>> that belongs
>>>>       together in one XML structure that can be validated any
>>>> time. When asked
>>>>       why someone should use Fedora, this is one of my top
>>>> arguments. NARA-RLG
>>>>       has far more expertise and experience than I have, so I
>>>> should probably
>>>>       dump my arguments and think of some new ones.
>>>>
>>>>       Still, I would be concerned about long-term
>>>> preservation of my DO's. If
>>>>       I start splitting it up my DO (well, Fedora does that
>>>> already with
>>>>       managed content, so it's not introducing anything new),
>>>> preservation
>>>>       becomes even more challenging. With my very little
>>>> knowledge about
>>>>       PREMIS and the idea to track all changes to an object
>>>> as events, isn't
>>>>       that exactly what the audit trail is good for? So would
>>>> I want to keep
>>>>       it as an integral part of my object?
>>>>
>>>>       Actually, for eSciDoc we can perfectly live without the
>>>> audit trail, as
>>>>       we write our own PREMIS-based event datastream for
>>>> graphs of objects, so
>>>>       both changes combined would probably boost the number
>>>> of versions before
>>>>       we run into out-of-memory errors.
>>>>
>>>>       Matthias.
>>>>
>>>>
>>>>
>>>>
>>>>               -----Original Message-----
>>>>               From: Daniel Davis [mailto:[EMAIL PROTECTED]
>>>>               Sent: Tuesday, August 12, 2008 5:21 PM
>>>>               To: Chris Wilper
>>>>               Cc: Razum, Matthias;
>>>>
>>>> [email protected]; Strnad, Kai
>>>>               Subject: Re: [Fedora-commons-developers] Fedora
>>>> OutOfMemoryErrors
>>>>
>>>>               The NARA-RLG report thinks that the "audit"
>>>> should be kept
>>>>               separate from the "object" anyway because of
>>>> the potential of
>>>>               tampering.  With correlation information kept
>>>> in the log,
>>>>               this information could be kept in
>>>> server/logs/audit.log which
>>>>               would be periodically snipped off and stored as
>>>> a non-inlined
>>>>               Datastream in a sequence of repository
>>>> generated objects that
>>>>               record change history.
>>>>
>>>>               This would make it harder in the future to make digital
>>>>               object change operations idempotent because it
>>>> is convenient
>>>>               to have that information localized to the
>>>> digital object in
>>>>               question.  Moving large audit trails to non-inlined
>>>>               Datastreams which are still encapsulated by the digital
>>>>               object would permit separate, though less
>>>> convenient, processing.
>>>>
>>>>               I am curious because the XML for fifty items
>>>> should not be
>>>>               large enough for a reasonable memory model
>>>> unless the traffic
>>>>               is very heavy.  I have not looked at that code
>>>> and I wonder
>>>>               if we can move to a delayed object creation
>>>> scheme to reduce
>>>>               the size of the business objects representing
>>>> the digital
>>>>               object in working memory.  I know we are
>>>> looking for a quick
>>>>               fix not a refactoring but I am still curious.
>>>>
>>>>               -- Dan
>>>>
>>>>               Chris Wilper wrote:
>>>>
>>>>                       Kai and Matthias,
>>>>
>>>>                       Just wanted to let you know I've been
>>>> doing some
>>>>               profiling on this
>>>>                       over here.  I suspect saving the audit
>>>> records external
>>>>               to the FOXML
>>>>                       would help a LOT with this.  One idea
>>>> is to avoid the
>>>>               special "AUDIT"
>>>>                       datastream altogether and save them in
>>>>               server/logs/audit.log instead.
>>>>                       Later refactorings could address the
>>>> issue of having to read the
>>>>                       entire DigitalObject to make a change
>>>> to one piece, but I think
>>>>                       dealing with the ever-growing "AUDIT"
>>>> datastream would
>>>>               be a simple way
>>>>                       to stop the bleeding.  Thoughts on this
>>>> approach?
>>>>
>>>>                       - Chris
>>>>
>>>>                       2008/7/29 Razum, Matthias
>>>>               <[EMAIL PROTECTED]>
>>>> <mailto:[EMAIL PROTECTED]>
>>>>               <mailto:[EMAIL PROTECTED]>
>>>> <mailto:[EMAIL PROTECTED]>  :
>>>>
>>>>
>>>>                               Hi all,
>>>>
>>>>                               This is a pretty severe bug for
>>>> us. We run into
>>>>               the issue when we try to
>>>>                               create a new version of an
>>>> object with ~50
>>>>               previous versions. This is a
>>>>                               not-so-rare condition if we
>>>> want to add members
>>>>               to a collection, thus
>>>>                               creating versions of the
>>>> collection object.
>>>>
>>>>                               I haven't seen any feedback for
>>>> this bug report
>>>>               on the list from the
>>>>                               Fedora dev team, and I can't find it in
>>>>               Fedora's bugtracker on
>>>>                               sourceforge.net. Any reaction
>>>> from the Fedora
>>>>               team would be highly
>>>>                               appreciated, even though I am
>>>> aware of the
>>>>               pressure from the upcoming
>>>>                               Fedora 3.0 release.
>>>>
>>>>                               Cheers,
>>>>                               Matthias.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                                       -----Original Message-----
>>>>                                       From:
>>>>               [EMAIL PROTECTED]
>>>>
>>>>
>>>> [mailto:[EMAIL PROTECTED]
>>>>
>>>>
>>>>                               t] On Behalf Of Strnad, Kai
>>>>
>>>>
>>>>                                       Sent: Monday, July 14,
>>>> 2008 11:47 AM
>>>>                                       To:
>>>>               [email protected]
>>>>                                       Subject:
>>>> [Fedora-commons-developers]
>>>>               Fedora OutOfMemoryErrors
>>>>
>>>>                                       Hi all,
>>>>
>>>>                                       we frequently encounter
>>>>               OutOfMemoryErrors when calling
>>>>                                       modifyDatastreamByValue
>>>> and other API-M
>>>>               methods on relatively large
>>>>                                       digital objects using
>>>> Fedora Commons
>>>>               3.0b1 and 3.0b2. In
>>>>                                       order to better
>>>>                                       understand the issue we
>>>> triggered heap
>>>>               dumps and analyzed them. The
>>>>                                       dumps revealed that up
>>>> to 140M of heap
>>>>               space get used by Fedora when
>>>>                                       calling
>>>> modifyDatastreamByValue on a
>>>>               digital object of 15M.
>>>>
>>>>                                       In order to provoke
>>>> heap dumps at each
>>>>               api call the heap size was
>>>>                                       reduced. Additionally
>>>> we triggered heap
>>>>               dumps at specific locations
>>>>                                       programmatically using
>>>> the Java6
>>>>               HotSpotDiagnosticMXBean.
>>>>
>>>>                                       The OutOfMemoryError
>>>> always occurs at
>>>>
>>>> DOTranslationUtility.writeToStream()
>>>>               after the serialization. This
>>>>                                       appears to be the peak
>>>> of heap usage
>>>>               for modifyDatastreamByValue.
>>>>                                       The heap dump shows the
>>>> following
>>>>               composition of objects at
>>>>                                       the time of
>>>>                                       writeToStream() (see
>>>> attached screenshot):
>>>>                                        * StringBuffer (60M) (15M * 2
>>>>               (internal UTF-16 representation)) + 30M
>>>>                                       memory allocated by
>>>> StringBuffer
>>>>               (StringBuffer doubles its capacity
>>>>                                       automatically when unsufficient
>>>>               capacity is left for appending a new
>>>>                                       String. Hence the
>>>> capacity is likely to
>>>>               exceed the actual
>>>>                                       memory needed
>>>>                                       unless explicitly allocated).
>>>>                                        * char[] array at
>>>> writeToStream
>>>>               (StringBuffer.toString())
>>>>                                       (31M) (15M *
>>>>                                       2 + overhead)
>>>>                                        * BasicDigitalObject 24M (15M
>>>>               DatastreamXMLMetadata, 9M AuditRecord)
>>>>                                        * DOReaderCache 25M (1
>>>>               BasicDigitalObject in cache at the time)
>>>>                                        * Some other small objects
>>>>
>>>>                                       If the heap space is
>>>> already consumed
>>>>               to a large extent, allocating
>>>>                                       another chunk of memory
>>>> may fail and
>>>>               subsequently trigger an
>>>>                                       OutOfMemoryError.
>>>> Explicitly calling
>>>>               the garbage collector is not a
>>>>                                       viable option, because
>>>> most of the
>>>>               objects involved are still bound
>>>>                                       locally to the thread,
>>>> so they are
>>>>               still reachable.
>>>>
>>>>                                       Increasing the heap
>>>> will solve the
>>>>               issue temporarily. Depending on the
>>>>                                       size of the digital
>>>> object the problem
>>>>               may however resurface: Suppose
>>>>                                       the digital object is 30M, then
>>>>               according to our findings a heap space
>>>>                                       of 60M*2 StringBuffer +
>>>> 60M char array
>>>>               + ~50M DO + ~50M cache = 280M
>>>>                                       would be needed for a
>>>> single digital
>>>>               object (we haven't tried this
>>>>                                       however).
>>>>
>>>>                                       We modified the Fedora
>>>> code and tried
>>>>               the following options:
>>>>                                       * We removed the
>>>> indentation in the
>>>>               FOXMLDOSerializer and
>>>>                                       DOTranslationUtility.
>>>> Removing most of
>>>>               the nonessential
>>>>                                       whitespaces (or
>>>>                                       replacing indentation
>>>> whitespaces with
>>>>               tabs) results in a much smaller
>>>>                                       DO size (about 20% in
>>>> our test case)
>>>>               and therefore reduces memory
>>>>                                       footprint.
>>>>
>>>>                                       * As for the
>>>> StringBuffer problem we
>>>>               basically tried two
>>>>                                       approaches. We
>>>>                                       trimmed the StringBuffer in
>>>>               FOXMLDOSerializer before the call to
>>>>                                       writeToStream() using
>>>> the trimToSize()
>>>>               method. This adjusts
>>>>                                       the capacity
>>>>                                       of the StringBuffer to
>>>> the actual size
>>>>               of characters contained within.
>>>>                                       Another option is to
>>>> explicitly size the buffer.
>>>>
>>>>                                       * The 64 bit version of
>>>> Java consumes
>>>>               considerably more heap space
>>>>                                       compared to the 32 bit
>>>> version. Using a
>>>>               32 bit version reduces memory
>>>>                                       usage.
>>>>
>>>>                                       All options mentioned
>>>> above work well
>>>>               and reduce memory consumption
>>>>                                       significantly, but
>>>> solve the underlying
>>>>               problem only partially.
>>>>
>>>>                                       Perhaps a better
>>>> solution would be to
>>>>               load and process only
>>>>                                       those parts
>>>>                                       of the digital object
>>>> needed for the
>>>>               current operation (not viable for
>>>>                                       ingest, but e.g.
>>>> modifyDatastreamByX),
>>>>               but that would probably involve
>>>>                                       lots of refactoring...
>>>>
>>>>                                       Has anyone had to deal
>>>> with this issue
>>>>               previously ? Any insights or
>>>>                                       suggestions would be great.
>>>>
>>>>
>>>>                                       Thank you very much,
>>>>                                       Kai
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>>
>>>>               -------------------------------------------------------
>>>>
>>>>               Fachinformationszentrum Karlsruhe, Gesellschaft
>>>> für wissenschaftlich-technische Information mbH.
>>>>               Sitz der Gesellschaft:
>>>> Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
>>>>               Geschäftsführerin: Sabine Brünger-Weilandt.
>>>>               Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Daniel W. Davis
>>>> Chief Software Architect, Fedora Commons
>>>> Researcher, Cornell Information Science
>>>> http://www.fedora-commons.org
>>>> [EMAIL PROTECTED]
>>>> [EMAIL PROTECTED]
>>>> (607) 255-6090 (Office)
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------
>>>
>>> Fachinformationszentrum Karlsruhe, Gesellschaft für 
>>> wissenschaftlich-technische Information mbH.
>>> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 
>>> 101892.
>>> Geschäftsführerin: Sabine Brünger-Weilandt.
>>> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
>>>
>>>
>>
>>
>> -------------------------------------------------------
>>
>> Fachinformationszentrum Karlsruhe, Gesellschaft für 
>> wissenschaftlich-technische Information mbH.
>> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 
>> 101892.
>> Geschäftsführerin: Sabine Brünger-Weilandt.
>> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
>>
>>
>>
>


-------------------------------------------------------

Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische 
Information mbH. 
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 
101892. 
Geschäftsführerin: Sabine Brünger-Weilandt. 
Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Re: [Fedora-commons-developers] Fedora OutOfMemoryErrors

Reply via email to