[ 
https://issues.apache.org/jira/browse/UIMA-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890008#comment-15890008
 ] 

Peter Klügl commented on UIMA-5306:
-----------------------------------

Marshalls mail (28.02.2017 21:42):



Re: space taken by ArrayLists for the 3 fields "partof", "beginMap", "endMap". 
Let's assume the number of distinct UIMA types in your application is 100.  (Is
this approximately right, or way off?).

Each Ruta instance would include 3 ArrayLists, each of which would be an array
of 100 Java References.  If you're running with < 32 GB heap, modern Javas store
references as 4 byte things.  So the approximate space taken by these arraylists
would be Object-Overhead + some-fields-in-ArrayList +
ObjectOverhead-for-ref-array + the-array-size.

The last element is what you could potentially reduce with smaller array
allocation.  It's size is 4 * 100 (with above assumption) = 400 bytes.

The overhead per java object runs maybe 24-32 bytes, so 2 of these would be, say
50 bytes.  And then there's some more for ArrayList items, perhaps.  These are
"fixed".

So, you might end up with 500 bytes per field, or 1500 for the 3 fields at the
beginning.

(This would be the size assuming these have no contents in them, of course).

A lower bound for 250K instances of RutaBasic (just considering these 3 fields)
would then be 250,000 * 500 = 125,000,000.  This is in the ballpark of your
estimate.

One possible design revision to ease this may be to consider if all 250K
instances of RutaBasic need to be in memory at a time.

-Marshall





> Memory Improvement - Unnecessary leaks
> --------------------------------------
>
>                 Key: UIMA-5306
>                 URL: https://issues.apache.org/jira/browse/UIMA-5306
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Ruta
>    Affects Versions: 2.3.0ruta
>         Environment: Windows 10, JVM with -Xmx 1024, Java JDK 1.8., 16gb 
> memory
>            Reporter: Dennis Bauer
>            Assignee: Peter Klügl
>
> In a productive setup we figured out, that there is a huge memory usage of 
> Ruta itself. With JVisualVM it's easy to see, that there is a relative small 
> amount of arrays of Arraylists but with a high memory consumption (250k 
> instances result in 243 000 000 byte memory that are reserved)
> The problem is, that in a clustered SaaS environment with less memory, these 
> arrays block relevant space in memory. A deeper look into these Arrays of 
> Arraylist let suggest the class org.apache.uima.ruta.type.RutaBasic
> A look at this class show three arrays that are instanced with the max. 
> possible value, that can be returned by the typesystem of CAS. 
> {code:Java}
>   private int[] partOf = new int[((TypeSystemImpl) 
> getCAS().getTypeSystem()).getLargestTypeCode()];
>   private Collection<?>[] beginMap = new ArrayList<?>[((TypeSystemImpl) 
> getCAS().getTypeSystem())
>           .getLargestTypeCode()];
>   private Collection<?>[] endMap = new ArrayList<?>[((TypeSystemImpl) 
> getCAS().getTypeSystem())
>           .getLargestTypeCode()];
>                 
> {code}
> In this improvement should be done an dynamic allocation of memory usage for 
> these arrays, so the total memory consumption would be reduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to