Re: [PR] OPENNLP-421 - Remove String interning from StringList (opennlp)

via GitHub Thu, 14 Dec 2023 15:05:50 -0800


kinow commented on PR #568:
URL: https://github.com/apache/opennlp/pull/568#issuecomment-1856892659

Thanks for the GC numbers. The new code seems a bit heavier on the GC and
mem allocations, but I think that was expected.

A bit late here, but I decided to give it a try. Excuse if I say anything
silly, I can check these results again tomorrow after having some :coffee: . I
opened your branch on IntelliJ, then made the jmh profile active by default
(how do you tell IntelliJ to include the deps of a profile? anywho),
invalidated cache, etc..

Executed the tests just to make sure they were running, as you already
provided the results. Then executed with the profiler, to generate the `.jfr`
file. Saved the `jmh` source folder, checked out `main`, then patched the
`pom.xml` files, and executed the same test again, to get another `.jfr` for
the `main` branch (so earlier file is for this branch, older is for main).

Opened both in IntelliJ and selected the `.jfr` file from this branch to
compare with the one from `main`. Not sure if they show a fair comparison as
the difference in throughput might exacerbate some number, but in case you find
it useful, @rzo1 .

## CPU samples

![flamegraph](https://github.com/apache/opennlp/assets/304786/5d715064-5ea0-4dbf-8d89-f8c277ad0a1f)

I think it just confirms the change of calls to intern functions.

![image](https://github.com/apache/opennlp/assets/304786/26235942-e711-4ed3-b242-18098b9ddca1)

Also the increase in GC calls.

![image](https://github.com/apache/opennlp/assets/304786/759928d2-1674-426a-a80a-24a39bf90ff8)

And I think the Random calls are from JMH for having more/less samples due
to higher throughput.

![image](https://github.com/apache/opennlp/assets/304786/a0eb0c07-8feb-421f-bd1d-01f56135479e)

## Memory allocations

![flamegraph](https://github.com/apache/opennlp/assets/304786/7e20ff7b-3b5d-45be-9984-67229c9441c6)

With a lot more Strings, as expected, as well as bytes due to the array copy
calls.

![image](https://github.com/apache/opennlp/assets/304786/d3f203ff-1bbc-4d33-8c70-c2ed93861443)

![image](https://github.com/apache/opennlp/assets/304786/43ba112d-2402-4c68-966f-f2e335334f74)

Unfortunately I don't have any better or more practical test to compare
memory (maybe a single run of a large example would be better for comparing
memory, instead of multiple jmh runs…). But I don't think this should really be
a problem. I wonder if that was some premature optimization or if at some point
someone had memory issues. But a change that's easy to revert if needed, and
that way users won't have to modify heap settings. So I am inclined to approve
it! :tada:

Will wait until tomorrow so you, Martin, Jeff, others had some time to
review :sleeping_bed:

Thanks!

Bruno

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] OPENNLP-421 - Remove String interning from StringList (opennlp)

Reply via email to