[
https://issues.apache.org/jira/browse/MNG-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731628#comment-17731628
]
Guillaume Nodet commented on MNG-7592:
--------------------------------------
That sounds like a good improvement.
Since interned strings are GC'ed, a trivial thing would be to wrap the
{{XmlPullParser}} created in the {{ModelReader#read}} method to use intern
strings on {{XmlPullParser#getName()}} and {{XmlPullParser#getText()}}. I
would think all name elements have to be interned, and most element's text
(groupId, artifactId, version, scope, etc...)
This could give a good estimate if that's worth investigating or not.
> String deduplication in model building
> --------------------------------------
>
> Key: MNG-7592
> URL: https://issues.apache.org/jira/browse/MNG-7592
> Project: Maven
> Issue Type: Improvement
> Reporter: Christoph Läubrich
> Priority: Major
>
> I currently investigate improving memory consumption in m2eclipse (maven ide
> extension) and noticed that one problem is that maven model seem to not
> deduplicate strings, so for large projects (I used apache camel as an
> example), there are a lot of duplicate strings hanging around, e.g. I see
> 12.000 instances of "org.apache.maven.plugins" or around 10.000 of
> "org.apache.camel" (please note that probably not all related to maven!).
> If I look at the Graph of incoming references I see for example that these
> are from Model/Artifact groupId.
> I know that string deduplication in general is hard and even controversial,
> but maybe one could think about such thing at least for the "hotsposts", e,g,
> groupId, artifactId and version or even managementKeys seem good candidates
> to be considered for such thing as these are used all over the place.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)