*Good *write up! I am not a big fan of silently ignoring user specified value, even if specified incorrectly. I would vote for filing a big to correct two issues 1) Checking for less than Integer.MAX_VALUE value after multiplying specified value by 1024 and 2) Raising an error for negative values.
Satheesh A B (JIRA) wrote: > [ > http://issues.apache.org/jira/browse/DERBY-1397?page=comments#action_12416418 > ] > >A B commented on DERBY-1397: >---------------------------- > >Well I spent a little time investigating this property and here's what I found >out, based on your questions. > >- Are we sure that the name of the property is >derby.language.maxMemoryPerTable ? > >Yes -- I ran some simple tests with this property name and it is indeed picked >up by the Derby optimizer and used accordingly. > >- What is the maximum value allowed? > >The property is stored as an integer value (as found in the >OptimizerFactorImpl.boot() method), so the *theoretical* maximum value is >Integer.MAX_VALUE (i.e. 2^31 - 1). Attempts to specify a larger value will >result in the following error at connect time: > >ij> connect 'simpDB'; >ERROR XJ040: Failed to start database 'simpDB', see the next exception for >details. >ERROR XJ001: Java exception: 'For input string: "2147483650": >java.lang.NumberFormatException'. > >But that said, one of the "gotchas" here is that the property is actually >specifying *kilobytes*, not bytes--i.e. the value that the user specifies is >multiplied by 1024. What this means is that if the user specifies >Integer.MAX_VALUE, Derby will grab the value and think that it's fine, then it >will multiply the value by 1024, which causes integer overflow and thus leads >to a maxMemoryPerTable that is negative--which is wrong. I think this >warrants a new Jira issue--Derby should check to make sure that the value, >when multiplied by 1024, is still less than Integer.MAX_VALUE. > >To answer your question, then, the documentation should explicitly say that >this number is specified *in kilobytes* (i.e. "1" means 1K which means 1024 >bytes) and that, therefore, the maximum value allowed is Integer.MAX_VALUE / >1024 (which is 2097151). Then we need to file a Jira issue (as mentioned >above) to check for this maximum and throw the appropriate error if it's >exceeded. > >- Is there a value that we want to recommend for users to specify? > >The default value is 1024, which translates into 1024K, which means the >default max memory per table is 1M. I think we should update the >documentation to indicate what the default is; if a user wants to change it, >then I don't think we need to (or want to) recommend a value, so long as we >indicate what the tradeoff is for lower/higher values (see below). > >- The text says that it can be set to smaller values, what is the minimum >value that we recommend? > >Again, I don't know about "recommending" a value. If the default isn't good >enough, then I think all we can do is say what happens for smaller values and >what happens for larger values, and then let the user make the decision >him/herself. > >That said, the theoretical minimum is 0--I checked and Derby will accept that >value and use it (but the performance hit is huge--see below). If the user >specifies a negative value, Derby will silently ignore the value and just use >the default--which is something that either warrants a fix (Derby should >complain about the negative value like it does for other invalid numbers) or >else should be documented (in my opinion). > >In terms of the documentation, then, I suggest that we add a link for the >property to the paragraph preceding the one cited in this issue description, >perhaps as follows: > ><begin> > >The hash table for a hash join is held in memory and if it gets big enough, it >will spill to the disk. The optimizer makes a very rough estimate of the >amount of memory required to make the hash table. If it estimates that the >amount of memory required would exceed the system-wide limit of memory use for >a table (see derby.language.maxMemoryPerTable), the optimizer chooses a nested >loop join instead. > ><end> > >Then the words "derby.language.maxMemoryPerTable" would link to the >"Properties" section (which already exists) and there is where we would put >the cited paragraph, with the following changes (feel free to modify as you >see fit): > ><begin> > >derby.language.maxMemoryPerTable > > Function: > > When considering whether or not to do a hash join with a FROM table, the > Derby optimizer will use this value to determine if the resultant hash > table would consume too much memory, where "too much" means that the > hash table would take up more than maxMemoryPerTable kilobytes of memory. > If the optimizer decides that the hash table will require greater than > maxMemoryPerTable kilobytes, it will reject the hash join and instead > do a nested loop join. > > If memory use is not a problem for your environment, setting this property > to a high number gives the optimizer the maximum flexibility in considering > what join strategy to use, which can lead to better performance. For more > limited environments, this property can be set to lower values to conserve > memory use--but note that if the value is set too low the optimizer could > reject most/all hash joins, which could have a significant negative impact > on query performance. > > This value must be specified in kilobytes. The minimum value is 0 > (which will completely disable all hash joins and thus is not recommended); > negative values will be ignored. The maximum value is 2097151, which comes > from java.lang.Integer.MAX_VALUE / 1024. > > Default: > > 1024 (Note that since the value is kilobytes, this translates into 1M). > > Scope: > > System-wide. > > Dynamic or static: > > This property is static; if you change it while Derby is running, the > change does not take effect until you reboot. > ><end> > >The part about what happens for a negative value might need to be removed >if we decide to throw an error instead of ignoring it--that's a question >worth posting to derby-dev, I think... > > > >>Tuning Guide: Puzzling optimizer documentation >>---------------------------------------------- >> >> Key: DERBY-1397 >> URL: http://issues.apache.org/jira/browse/DERBY-1397 >> Project: Derby >> Type: Bug >> >> > > > >> Components: Documentation >> Versions: 10.2.0.0 >> Reporter: Rick Hillegas >> Assignee: Laura Stewart >> Fix For: 10.2.0.0 >> >> > > > >> Selectivity and cardinality statistics >> Working with cardinality statistics >> When cardinality statistics are automatically updated >> "For other operations, Derby automatically updates statistics for the >> table and all indexes on the table if they are already exist. Those >> operations are: >> * (all indexes) When you execute SYSCS_UTIL.SYSCS_COMPRESS_TABLE. >> * (index only) When you drop a column that is part of a table's index; the >> statistics for the affected index are dropped, and statistics for the other >> indexes on the table are updated. >>" >>What does the second bullet mean? Derby doesn't let you drop a column from a >>table right now. >>---------------------------------------------------------- >>Here's another puzzling piece of optimizer documentation: >>I'm puzzled by the following paragraph in Tuning Guide->DML statements and >>performance->Performance and optimization->Joins and performance->Join >>strategies: >>"If memory use is not a problem for your environment, set this property to a >>high number; allowing the optimizer the maximum flexibility in considering a >>join strategy queries involving large queries leads to better performance. It >>can also be set to smaller values for more limited environments." >>I can't find the name of this property on that page of the Tuning Guide. I'm >>also confused about what we consider to be a "high number" versus what we >>consider to be "smaller values". Would appreciate advice here. >>Satheesh adds this: >>The property it may be referring to is >>*derby.language.maxMemoryPerTable*. The default value is 1024 KB. >>Current default value is too small, so it would be a good tip for >>developers to know and tune this property. It would be great if Derby >>can configure this property value based on factors like max heap size, >>size of data cache and/or other parameters. >> >> > > >
