[ 
https://issues.apache.org/jira/browse/UIMA-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970476#comment-13970476
 ] 

Richard Eckart de Castilho commented on UIMA-3747:
--------------------------------------------------

We store type system information along with the compressed binary CAS in a 
single file. When we read these files, we do indeed get a new type system 
instance for each file/CAS - even though the type systems are largely (if not 
completely) equivalent. 

We could additionally calculate a hash over the type system and store that. 
When we read a new file, we could compare the type system hash to the hash of 
the previous file and - if they match - reuse the previous instance. However, 
this feels like a fix that shouldn't be offloaded to the client code, much in 
the same why that you recently added thread-safety to various places in the 
framework code.

Thanks, we'll test the fix.

> Memory management problem with compressed binary deserialization
> ----------------------------------------------------------------
>
>                 Key: UIMA-3747
>                 URL: https://issues.apache.org/jira/browse/UIMA-3747
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>    Affects Versions: 2.4.2SDK
>            Reporter: Richard Eckart de Castilho
>            Assignee: Marshall Schor
>             Fix For: 2.6.0SDK
>
>
> We think we stumbled across a memory management problem with the new 
> compressed binary serialization when a CAS is reset/reused in a loop, e.g. in 
> the uimaFIT SimplePipeline. When we use form 6, we consistently run into 
> out-of-memory situations. Finally, we took the time to do a heap dump 
> analysis.
> We found a huge TypeSystemImpl instance in the heap (~450MB). What makes it 
> huge is the field "typeSystemMappers"
> that in our case contains 1000+ entries, each of them using apparently using 
> a TypeSystemImpl as key.
> It looks like typeSystemMappers is never reset when a CAS is reused. My 
> current theory is, that it should be reset when CAS.reset() is called, 
> otherwise type systems accumulate there when the binary deserialization is 
> used to repeatedly load data into a CAS in a loop that is resetting and 
> reusing the CAS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to