cshannon commented on issue #4783: URL: https://github.com/apache/accumulo/issues/4783#issuecomment-2293799324
I started to investigate this more today because I was looking to see what size limit might be appropriate. I applied the sample patch here to generate the OOM heap dumps and I noticed that when I tried limiting the cache size to something small, the test was still generating OOM errors which was pretty weird. I went ahead and loaded up the heap dumps using the Eclipse memory analyzer and took a look and I discovered that the memory leak in this case had nothing to do with the cache inside of VolumeManagerImpl. There were a bunch of Configuration objects with a weak reference hash map which was surprising and looking into it more I discovered the source of the memory leak in this case was actually because the hadoop Configuration object [registers](https://github.com/apache/hadoop/blob/f00094203bf40a8c3f2216cf22eaa5599e3b9b4d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L834) all the new objects in a weak reference [hash map](https://github.com/apache/hadoop/blob/f00094203bf40a8c3f2216cf22eaa5599e3b9b4d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L325) inside it's constructor. In this case, the memory leak generated by the test modifications didn't cause a leak due to the `VolumeManagerImpl` c ache. Thinking about the changes here to the testing, this behavior makes sense because `TestAmple` doesn't use that cache. It's just [creating](https://github.com/apache/accumulo/blob/d8185cdea742b00c17b2877f6198fb2a8f73a7ef/test/src/main/java/org/apache/accumulo/test/ample/metadata/TestAmple.java#L243) a new ServerContext each time a new TestAmple is loaded which in turn will end up creating a new config @keith-turner - So after finding this I was curious if this was actually the memory leak all along, however reading over the issue again, you said that you analyzed the heap dump and saw the objects were attached to the `VolumeManagerImpl` cache. If that is the case then I'm assuming that means the way we are trying to reproduce this bug here is actually not correct, and the OOM error being generated is similar (too many Configuration objects in memory) but not exactly the same as the large number of Configuration objects generated by TestAmple to cause the leak are not being stored in the VolumeManagerImpl but instead being referenced by the Configuration object itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
