kinow commented on PR #568: URL: https://github.com/apache/opennlp/pull/568#issuecomment-1858767471
> I think, we c/should consider using a `ConcurrentHashMap`-based deduplicator approach to replace "intern()" calls, as investigated and explained by Shipilev in his blog post here: > > https://shipilev.net/jvm/anatomy-quarks/10-string-intern/ > > The removal of the String deduplication, as found in the code segments of this PR, could otherwise lead to more required RAM at runtime - as @kinow already expressed. > > Wdyt? @jzonthemtn @rzo1 > > (A) Removal (PR as is now), or (B) Replacement with concurrent-capable custom deduplicator? (B) looks interesting, +1 for trying that out, and for @rzo1 suggestion to add that on top of this one. >`CHMInterner` First time I've seen an intern made in the code (intentionally like that). Sounds like a good idea! You do the intern without using the heap. Not sure if it will perform as fast as String.intern, but I think it might work! Thanks @mawiesne -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@opennlp.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org