Re: [jira] Commented: (DERBY-1397) Tuning Guide: Puzzling optimizer documentation

Army Fri, 16 Jun 2006 09:49:43 -0700

Bryan Pendleton wrote:

Is it really realistic, though, to allow a setting of 2 million,
which gives the optimizer permission to try to create a
in-memory hash table of 2 gigabytes? I mean, I'd need a 64-bit JVM...


Related, is it worth putting a line or two in this doc telling the
user that it would be foolish to set this value to something which
is higher than the actual physical memory that they have on their
machine -- if I have a 256 Mb machine, telling the optimizer to do
a 512 Mb hash join in "memory" is probably performance suicide, no?

I think these are both excellent points, and I think that Yes, it would be agood idea to add the appropriate notes to the documentation. When I answeredthe question of "max value", I was just looking at it from a programmaticstandpoint--but as you've pointed out, there are some practical considerationshere, as well. Note that "performance suicide" is the right term here--asdescribed below, the query *should* theoretically still execute, but it mightsuffer from a spill-to-disk performance penalty.

Also, perhaps we should have a line suggesting that if the
user decides to increase this value, they should check their JVM
memory settings and make sure they've authorized the JVM to use
that much memory (e.g., on Sun JVMs, -XmxNNNm should be set).

Sounds good to me--thanks for pointing that out. Note though, that if the userdoes not give the JVM more memory, the query should (in theory) still work--seebelow--but it might take a performance hit at execution time.

Lastly, what is the symptom if the user sets this too high? If
they tell the optimizer that it's allowed to make, for example,
a 200 Mb hash table by setting this value to 200,000, then what
happens if the runtime evaluation of the query plan finds that
it can't allocate that much memory? Does the query fail? Does
Derby crash?

Yet another very good question. I haven't done thorough testing, but my *guess*is that this issue is indirectly addressed by some "follow-up" changes that Iposted for DERBY-1007: in particular, see the BackingStoreHashtable changes ind1007_followup_v1.patch, which is attached to that issue. The changes there doa very rough (and only slightly meaningful, unit-wise) comparison of estimatedrow count to max in-memory size, and if it thinks there's not enough memory inthe JVM, it will create a smaller hash table and then spill the excess to disk.I looked at the relevant code in the BackingStoreHashtable and the value ofmax_inmemory_size is set as follows:


       if( max_inmemory_rowcnt > 0)
            max_inmemory_size = Long.MAX_VALUE;
        else
            max_inmemory_size = Runtime.getRuntime().totalMemory()/100;

where max_inmemory_rowcnt is received from the optimizer. So far as I can tellfrom the quick tests I did, in cases where we're creating a hash table for ahash join as determined by the optimizer, max_inmemory_rowcnt is always -1 andthus max_inmemory_size is set based on the available JVM memory at the time thebacking store hash table is created. In such a case the BackingStoreHashtablewill receive the estimated row count from the optimizer. If themaxMemoryPerTable was set "too high", then the estimated row count that we getcould be insanely high, as well (if maxMemoryPerTable was lower, the optimizerwould have rejected the hash join and we never would have made it here). Thatsaid, the DERBY-1007 follow-up changes added logic to change the way a hashtable is built if the estimated row count is "too high": namely, if it's equalto or larger than the max in-memory bytes, we'll create a hash table that isonly as large as { max in-memory bytes / estimated size of first row }.

And that, I think, means that we should only create a hash table that willactually fit in the available JVM memory; if we need a larger hash table, thenwe'll spill to disk and take the performance hit, but the query should stillexecute. In theory.

Of course, all of that is assuming that the "red flag" for a large hash table isa high row count. If it turns out that we have a low row count but each row isextremely large, the changes for DERBY-1007 will not be triggered. I think,however, that we should still be okay, but for a different reason. In thisscenario, the hash table itself will not be very large, so we should have enoughmemory to create it (empty). Then, the BackingStoreHashtable code keeps trackof how much JVM memory is available after each insertion into the hash table.So if we have a couple of very large rows, the available memory will decreasewith each row and the optimizer will start spilling to disk before the hashtable is full--which should keep us from running out of memory...


I think.  If I've got any of this wrong, someone please feel free to correct me!

So to get back to your final question of "what is the symptom if the user setsthis too high?", the theoretical answer is that the query performance might endup being lousy because the hash table will spill to disk. A more practical (andadmittedly uglier) answer is that the query might perform poorly OR the usermight encounter memory problems (including OutOfMemory errors) at execution timewhen Derby is building the hash table. Theoretically the OutOfMemory conditionsshouldn't arise (as explained above), and I haven't been able to produce anysince DERBY-1007 was closed, but I guess the possibility is still there. If wechoose to document the possibility, then a user who changes maxMemoryPerTableand suddenly hits an OutOfMemory error might find it useful to have thatbehavior documented as a "potential" problem with a maxMemoryPerTable value thatis too large.

Well, that took a lot longer to write than I thought it would. Hopefully theanswers to your questions are somewhere in there...?


Army

Re: [jira] Commented: (DERBY-1397) Tuning Guide: Puzzling optimizer documentation

Reply via email to