Recently I've been trying to reduce memory usage of our website, and 
basically found that NHibernate was the largest source of memory usage in 
our system, and as I dug in found some things that I considered to be 
issues, though I understand the rationale behind them. I thought we might 
have a discussion about it though and talk about some solutions. I will say 
that my knowledge of nhibernate's inner workings and reasons why things are 
is very cursory and limited to what I have discovered/guessed during this 
process, which is why I would like this to be a discussion.

*Background that can be skipped but explains why this is an issue for us.*
The way our system was originally built is that in IIS each customer has 
their own website (so we are not multi-tenant), and we have a few builds 
that the customers are on (alpha, beta, stable) that we routinely move them 
between. Originally we had everyone in their own AppPool (process) but that 
caused a lot of memory issues because each site would have it's own copy 
off the code dll's loaded into memory, so we combined people into groups of 
app domains because IIS will share assemblies that are the same between the 
sites which now are in separate AppDomains rather than separate processes. 
This works fairly well, but we are still hitting the top end of the memory 
for an AppPool about every hour, so that AppPool will get recycled and a 
new process will be spun up. This kind of sucks because it takes about 10 
seconds to initialize the application (yes, we serialize our nhibernate 
configuration).

*The real issue*

Now, while poking around in WinDbg, I decided to look at the size of the 
SessionFactory, which for our system was ~60MB. I also notice that there is 
about ~30MB of strings in the process, but i'm not sure how many are unique 
to nhibernate, but we'll just say that it's 20MB (before session factory is 
built, strings account for ~10MB). Now, when I look at the session factory, 
it looks like for every type and builds the persisters / select builders / 
whatever which build up SqlString instances and cache them. All of those 
strings are built by SqlString class and stored in the parts. The problem 
is, by keeping those and holding them, you can never free up that system 
memory. Likely this is so that those never have to be generated again, and 
I do see that SqlString is immutable so really it all makes sense, just 
there is the problem that the memory can never be free'd up by the system. 
This sucks when most things never get used (in our case, we have lots of 
different parts of our system, but most companies only use a couple).

Now the question I have is do all of those strings _need_ to be cached? I 
understand all the reasons why: they never change so we should just 
generate once, generating on initialization removes need for locks, etc. At 
the same time I think, how expensive is it to generate an insert string? 
When it comes to select strings, how often is that string re-used? With 
dynamic queries (ie linq or just building up a query over or hql string), 
are they also stored someplace in which they can't be garbage collected?

Another thing I noticed is that some strings could EASILY be interned so 
that they exist only once where currently there are millions of duplicates. 
There are strings like ) and , that have lots of instances, but not the 
same instance, duplicate instances across lots of SqlString instances. If 
there were an interned version that was used, it would help with the memory 
situation. On the other hand, these do not make up the majority of memory 
held by strings, maybe at most 0.5MB, but there are a lot, so interning 
could help the garbage collector out.

*Ideas to help the situation*
*
*

   1. First idea, and the least invasive/problematic would be to intern 
   certain strings like "() ' and or" and use those when building sql strings. 
   This is only really necessary due to keeping SqlString instances around, 
   which pin those strings in memory
   2. Don't cache SqlString / SqlCommand in persisters/generators/whatever, 
   cache them in ISession and regenerate them in each session.
   3. Cache them in a least-recently-used type cache for which a copy is 
   injected into the ISession and is updated when sessions are disposed / 
   closed (this would imply that some amounts of locking would need to be 
   added, but only if sessions added new queries)


Thoughts? Suggestions? I understand that this would constitute large 
changes, and I understand that likely nothing will come of this, but I do 
think that these are real issues and worth thinking about and looking out 
for in future coding as well. As for how much solving this would take, I 
have no idea, because like I said earlier, I only have a cursory 
understanding of nhibernate's inner workings.

Reply via email to