[ 
https://issues.apache.org/jira/browse/UIMA-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887782#comment-15887782
 ] 

Peter Klügl commented on UIMA-5306:
-----------------------------------

The idea of the new functionality is to store only what is really required by 
the rules that are executed instead of all information or all reindexed 
information, respectively. However, it's not as easy as it sounds and there are 
more "side effects" in special use cases than I thought. Thus, I have to 
evaluate it more and identify best practices.

I switched to the low level implementation in order to reduce the memory usage. 
If you can check this, you are welcome, but I do not know where I would look. 
What do you mean exactly by "native implementations and the Java 
implementiation"?

I think we need to reproducible performance testbeds for ruta covering 
different use cases and different ruta setups.

> Memory Improvement - Unnecessary leaks
> --------------------------------------
>
>                 Key: UIMA-5306
>                 URL: https://issues.apache.org/jira/browse/UIMA-5306
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Ruta
>    Affects Versions: 2.3.0ruta
>         Environment: Windows 10, JVM with -Xmx 1024, Java JDK 1.8., 16gb 
> memory
>            Reporter: Dennis Bauer
>            Assignee: Peter Klügl
>
> In a productive setup we figured out, that there is a huge memory usage of 
> Ruta itself. With JVisualVM it's easy to see, that there is a relative small 
> amount of arrays of Arraylists but with a high memory consumption (250k 
> instances result in 243 000 000 byte memory that are reserved)
> The problem is, that in a clustered SaaS environment with less memory, these 
> arrays block relevant space in memory. A deeper look into these Arrays of 
> Arraylist let suggest the class org.apache.uima.ruta.type.RutaBasic
> A look at this class show three arrays that are instanced with the max. 
> possible value, that can be returned by the typesystem of CAS. 
> {code:Java}
>   private int[] partOf = new int[((TypeSystemImpl) 
> getCAS().getTypeSystem()).getLargestTypeCode()];
>   private Collection<?>[] beginMap = new ArrayList<?>[((TypeSystemImpl) 
> getCAS().getTypeSystem())
>           .getLargestTypeCode()];
>   private Collection<?>[] endMap = new ArrayList<?>[((TypeSystemImpl) 
> getCAS().getTypeSystem())
>           .getLargestTypeCode()];
>                 
> {code}
> In this improvement should be done an dynamic allocation of memory usage for 
> these arrays, so the total memory consumption would be reduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to