[
https://issues.apache.org/jira/browse/UIMA-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970476#comment-13970476
]
Richard Eckart de Castilho commented on UIMA-3747:
--------------------------------------------------
We store type system information along with the compressed binary CAS in a
single file. When we read these files, we do indeed get a new type system
instance for each file/CAS - even though the type systems are largely (if not
completely) equivalent.
We could additionally calculate a hash over the type system and store that.
When we read a new file, we could compare the type system hash to the hash of
the previous file and - if they match - reuse the previous instance. However,
this feels like a fix that shouldn't be offloaded to the client code, much in
the same why that you recently added thread-safety to various places in the
framework code.
Thanks, we'll test the fix.
> Memory management problem with compressed binary deserialization
> ----------------------------------------------------------------
>
> Key: UIMA-3747
> URL: https://issues.apache.org/jira/browse/UIMA-3747
> Project: UIMA
> Issue Type: Bug
> Components: Core Java Framework
> Affects Versions: 2.4.2SDK
> Reporter: Richard Eckart de Castilho
> Assignee: Marshall Schor
> Fix For: 2.6.0SDK
>
>
> We think we stumbled across a memory management problem with the new
> compressed binary serialization when a CAS is reset/reused in a loop, e.g. in
> the uimaFIT SimplePipeline. When we use form 6, we consistently run into
> out-of-memory situations. Finally, we took the time to do a heap dump
> analysis.
> We found a huge TypeSystemImpl instance in the heap (~450MB). What makes it
> huge is the field "typeSystemMappers"
> that in our case contains 1000+ entries, each of them using apparently using
> a TypeSystemImpl as key.
> It looks like typeSystemMappers is never reset when a CAS is reused. My
> current theory is, that it should be reset when CAS.reset() is called,
> otherwise type systems accumulate there when the binary deserialization is
> used to repeatedly load data into a CAS in a loop that is resetting and
> reusing the CAS.
--
This message was sent by Atlassian JIRA
(v6.2#6252)