[
https://issues.apache.org/jira/browse/UIMA-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890009#comment-15890009
]
Peter Klügl commented on UIMA-5306:
-----------------------------------
My reply:
Thanks Marshall for the detailed analysis.
My comment about you commenting was actually pointing to the memory
consumption of the c++ near uima implementation. I guess Dennis was
referring to that but I am not sure. My opinion is that the memory is
consumed exactly by what you described, and especially by the conetent
of the three fields.
The approach storing additional information about the location and
coverage of annotations in an annotation was always questioned by me (at
least a bit). There are several other options to implement it, e.g.,
maybe using a custom index. I decided to use annotations for different
reasons and I do not know if it is the best solution.
The memory consumption can be reduced by optimizing the data structures,
but I think reducing the amount of information that is stored or needs
to be stored is a more fruitful way to go. I implemented an experimental
prototype but the effect was not yet as big as I hoped for. One reason
is that still too much is stored in RutaBasic which is required by some
functionality, e.g., partof is required by sequential matching because
of the coverage-based visibility concept, not only required by the
PARTOF condition. The visibility concept is not optional even if it is
not required in the actual script.
For optimizing ruta, we need some fixed evaluation setting for different
but common use case so that we do not only improve the implementation
for large documents or specific kind of rules. For adapting to different
use case of ruta, the encapsulation in ruta needs to be increased, e.g.,
RutaStream needs to be refactored and needs to be replaceable with other
implementation. The good thing is that ruta-core has enough unit tests
compared to a few years ago so that such drastic changes can be
implemented now.
There can be no quick solution right now. I'll try to find some
compromise for the next rc and try to get ruta released as soon as
possible since there are so many bugfixes waiting.
Peter
> Memory Improvement - Unnecessary leaks
> --------------------------------------
>
> Key: UIMA-5306
> URL: https://issues.apache.org/jira/browse/UIMA-5306
> Project: UIMA
> Issue Type: Improvement
> Components: Ruta
> Affects Versions: 2.3.0ruta
> Environment: Windows 10, JVM with -Xmx 1024, Java JDK 1.8., 16gb
> memory
> Reporter: Dennis Bauer
> Assignee: Peter Klügl
>
> In a productive setup we figured out, that there is a huge memory usage of
> Ruta itself. With JVisualVM it's easy to see, that there is a relative small
> amount of arrays of Arraylists but with a high memory consumption (250k
> instances result in 243 000 000 byte memory that are reserved)
> The problem is, that in a clustered SaaS environment with less memory, these
> arrays block relevant space in memory. A deeper look into these Arrays of
> Arraylist let suggest the class org.apache.uima.ruta.type.RutaBasic
> A look at this class show three arrays that are instanced with the max.
> possible value, that can be returned by the typesystem of CAS.
> {code:Java}
> private int[] partOf = new int[((TypeSystemImpl)
> getCAS().getTypeSystem()).getLargestTypeCode()];
> private Collection<?>[] beginMap = new ArrayList<?>[((TypeSystemImpl)
> getCAS().getTypeSystem())
> .getLargestTypeCode()];
> private Collection<?>[] endMap = new ArrayList<?>[((TypeSystemImpl)
> getCAS().getTypeSystem())
> .getLargestTypeCode()];
>
> {code}
> In this improvement should be done an dynamic allocation of memory usage for
> these arrays, so the total memory consumption would be reduced.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)