[
https://issues.apache.org/jira/browse/UIMA-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887936#comment-15887936
]
Peter Klügl commented on UIMA-5306:
-----------------------------------
Maybe Marshall can comment on the memory consumption of the addresses? I assume
that the bottleneck is rather ruta. Have you measured the memory footprint of
your use case with about the same amount of annotations without ruta?
I know drools and rete a bit, but not PHREAK. I have to take a look. I thought
about automatic optimizations before but only for the speed. The increased
expressiveness of ruta makes this really complicated. As a consequence,
generic, all-encompassing automatic optimizations are not really possible
without restricting the expressiveness. Thus, only small optimizations have
been added in patches. There are also differences between production rule
systems and regular expression rule systems/FST.
I have not thought about automatic optimization concerning the memory
consumption. Maybe the situation will get better with uima 3 where the GC can
clean up stuff. There are still many possibilities to improve ruta concerning
speed and memory consumption, e.g., the prototype of the type usage I
implemented the last days. Or think about running ruta without RutaBasic at all.
After all, it is always about the time one can invest to improve the system.
The further development, fixes and new features have always been driven by
current requirements, e.g., when it was not fast enough I did something to
improve the speed.
Your idea of removing information, that is not needed anymore, is very nice and
would be implementable if the type usage information is stored for every other
step in the rules, (and with some sort of dynamic RutaBasic). However, I would
bet that there are lower fruits.
You are very welcome to participate. Ruta was born about 10 years ago and not
every part was developed in the same speed or with the same care. There are
still a lot of inherited waste and legacy issues, but I try to constantly
improve at least the code quality of ruta-core, which is also the part that
matters the most. If I can help you to get into the project somehow, just let
me know. Even if you just point out code that should be refactored.
Yeah, the RutaParser is terrible long, but most is just syntax grammar for
conditions and actions. I did not replace that for a generic rule so that I do
not lose the argument checks.
> Memory Improvement - Unnecessary leaks
> --------------------------------------
>
> Key: UIMA-5306
> URL: https://issues.apache.org/jira/browse/UIMA-5306
> Project: UIMA
> Issue Type: Improvement
> Components: Ruta
> Affects Versions: 2.3.0ruta
> Environment: Windows 10, JVM with -Xmx 1024, Java JDK 1.8., 16gb
> memory
> Reporter: Dennis Bauer
> Assignee: Peter Klügl
>
> In a productive setup we figured out, that there is a huge memory usage of
> Ruta itself. With JVisualVM it's easy to see, that there is a relative small
> amount of arrays of Arraylists but with a high memory consumption (250k
> instances result in 243 000 000 byte memory that are reserved)
> The problem is, that in a clustered SaaS environment with less memory, these
> arrays block relevant space in memory. A deeper look into these Arrays of
> Arraylist let suggest the class org.apache.uima.ruta.type.RutaBasic
> A look at this class show three arrays that are instanced with the max.
> possible value, that can be returned by the typesystem of CAS.
> {code:Java}
> private int[] partOf = new int[((TypeSystemImpl)
> getCAS().getTypeSystem()).getLargestTypeCode()];
> private Collection<?>[] beginMap = new ArrayList<?>[((TypeSystemImpl)
> getCAS().getTypeSystem())
> .getLargestTypeCode()];
> private Collection<?>[] endMap = new ArrayList<?>[((TypeSystemImpl)
> getCAS().getTypeSystem())
> .getLargestTypeCode()];
>
> {code}
> In this improvement should be done an dynamic allocation of memory usage for
> these arrays, so the total memory consumption would be reduced.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)