Re: [jira] Commented: (DERBY-1397) Tuning Guide: Puzzling optimizer documentation

Satheesh Bandaram Thu, 15 Jun 2006 15:29:34 -0700

*Good *write up! I am not a big fan of silently ignoring user specified
value, even if specified incorrectly. I would vote for filing a big to
correct two issues 1) Checking for less than Integer.MAX_VALUE value
after multiplying specified value by 1024 and 2) Raising an error for
negative values.


Satheesh

A B (JIRA) wrote:

>    [ 
> http://issues.apache.org/jira/browse/DERBY-1397?page=comments#action_12416418 
> ] 
>
>A B commented on DERBY-1397:
>----------------------------
>
>Well I spent a little time investigating this property and here's what I found 
>out, based on your questions.
>
>- Are we sure that the name of the property is 
>derby.language.maxMemoryPerTable ?
>
>Yes -- I ran some simple tests with this property name and it is indeed picked 
>up by the Derby optimizer and used accordingly.
>
>- What is the maximum value allowed?
>
>The property is stored as an integer value (as found in the 
>OptimizerFactorImpl.boot() method), so the *theoretical* maximum value is 
>Integer.MAX_VALUE (i.e. 2^31 - 1).  Attempts to specify a larger value will 
>result in the following error at connect time:
>
>ij> connect 'simpDB';
>ERROR XJ040: Failed to start database 'simpDB', see the next exception for 
>details.
>ERROR XJ001: Java exception: 'For input string: "2147483650": 
>java.lang.NumberFormatException'.
>
>But that said, one of the "gotchas" here is that the property is actually 
>specifying *kilobytes*, not bytes--i.e. the value that the user specifies is 
>multiplied by 1024.  What this means is that if the user specifies 
>Integer.MAX_VALUE, Derby will grab the value and think that it's fine, then it 
>will multiply the value by 1024, which causes integer overflow and thus leads 
>to a maxMemoryPerTable that is negative--which is wrong.  I think this 
>warrants a new Jira issue--Derby should check to make sure that the value, 
>when multiplied by 1024, is still less than Integer.MAX_VALUE.
>
>To answer your question, then, the documentation should explicitly say that 
>this number is specified *in kilobytes* (i.e. "1" means 1K which means 1024 
>bytes) and that, therefore, the maximum value allowed is Integer.MAX_VALUE / 
>1024 (which is 2097151).  Then we need to file a Jira issue (as mentioned 
>above) to check for this maximum and throw the appropriate error if it's 
>exceeded.
>
>- Is there a value that we want to recommend for users to specify?
>
>The default value is 1024, which translates into 1024K, which means the 
>default max memory per table is 1M.  I think we should update the 
>documentation to indicate what the default is; if a user wants to change it, 
>then I don't think we need to (or want to) recommend a value, so long as we 
>indicate what the tradeoff is for lower/higher values (see below).
>
>- The text says that it can be set to smaller values, what is the minimum 
>value that we recommend?
>
>Again, I don't know about "recommending" a value.  If the default isn't good 
>enough, then I think all we can do is say what happens for smaller values and 
>what happens for larger values, and then let the user make the decision 
>him/herself.
>
>That said, the theoretical minimum is 0--I checked and Derby will accept that 
>value and use it (but the performance hit is huge--see below).  If the user 
>specifies a negative value, Derby will silently ignore the value and just use 
>the default--which is something that either warrants a fix (Derby should 
>complain about the negative value like it does for other invalid numbers) or 
>else should be documented (in my opinion).
>
>In terms of the documentation, then, I suggest that we add a link for the 
>property to the paragraph preceding the one cited in this issue description, 
>perhaps as follows:
>
><begin>
>
>The hash table for a hash join is held in memory and if it gets big enough, it 
>will spill to the disk. The optimizer makes a very rough estimate of the 
>amount of memory required to make the hash table. If it estimates that the 
>amount of memory required would exceed the system-wide limit of memory use for 
>a table (see derby.language.maxMemoryPerTable), the optimizer chooses a nested 
>loop join instead.
>
><end>
>
>Then the words "derby.language.maxMemoryPerTable" would link to the 
>"Properties" section (which already exists) and there is where we would put 
>the cited paragraph, with the following changes (feel free to modify as you 
>see fit):
>
><begin>
>
>derby.language.maxMemoryPerTable
>
>  Function:
>
>   When considering whether or not to do a hash join with a FROM table, the
>   Derby optimizer will use this value to determine if the resultant hash
>   table would consume too much memory, where "too much" means that the
>   hash table would take up more than maxMemoryPerTable kilobytes of memory.
>   If the optimizer decides that the hash table will require greater than
>   maxMemoryPerTable kilobytes, it will reject the hash join and instead
>   do a nested loop join.
>
>   If memory use is not a problem for your environment, setting this property
>   to a high number gives the optimizer the maximum flexibility in considering
>   what join strategy to use, which can lead to better performance.  For more
>   limited environments, this property can be set to lower values to conserve
>   memory use--but note that if the value is set too low the optimizer could
>   reject most/all hash joins, which could have a significant negative impact
>   on query performance.
>
>   This value must be specified in kilobytes.  The minimum value is 0
>   (which will completely disable all hash joins and thus is not recommended);
>   negative values will be ignored.  The maximum value is 2097151, which comes
>   from java.lang.Integer.MAX_VALUE / 1024.
>
>  Default:
>
>   1024  (Note that since the value is kilobytes, this translates into 1M).
>
>  Scope:
>
>   System-wide.
>
>  Dynamic or static:
>
>   This property is static; if you change it while Derby is running, the
>   change does not take effect until you reboot. 
>
><end>
>
>The part about what happens for a negative value might need to be removed
>if we decide to throw an error instead of ignoring it--that's a question
>worth posting to derby-dev, I think...
>
>  
>
>>Tuning Guide: Puzzling optimizer documentation
>>----------------------------------------------
>>
>>         Key: DERBY-1397
>>         URL: http://issues.apache.org/jira/browse/DERBY-1397
>>     Project: Derby
>>        Type: Bug
>>    
>>
>
>  
>
>>  Components: Documentation
>>    Versions: 10.2.0.0
>>    Reporter: Rick Hillegas
>>    Assignee: Laura Stewart
>>     Fix For: 10.2.0.0
>>    
>>
>
>  
>
>> Selectivity and cardinality statistics
>>   Working with cardinality statistics
>>     When cardinality statistics are automatically updated
>>       "For other operations, Derby automatically updates statistics for the 
>> table and all indexes on the table if they are already exist. Those 
>> operations are:
>>   * (all indexes) When you execute SYSCS_UTIL.SYSCS_COMPRESS_TABLE.
>>   * (index only) When you drop a column that is part of a table's index; the 
>> statistics for the affected index are dropped, and statistics for the other 
>> indexes on the table are updated.
>>"
>>What does the second bullet mean? Derby doesn't let you drop a column from a 
>>table right now. 
>>----------------------------------------------------------
>>Here's another puzzling piece of optimizer documentation:
>>I'm puzzled by the following paragraph in Tuning Guide->DML statements and 
>>performance->Performance and optimization->Joins and performance->Join 
>>strategies:
>>"If memory use is not a problem for your environment, set this property to a 
>>high number; allowing the optimizer the maximum flexibility in considering a 
>>join strategy queries involving large queries leads to better performance. It 
>>can also be set to smaller values for more limited environments."
>>I can't find the name of this property on that page of the Tuning Guide. I'm 
>>also confused about what we consider to be a "high number" versus what we 
>>consider to be "smaller values". Would appreciate advice here. 
>>Satheesh adds this:
>>The property it may be referring to is
>>*derby.language.maxMemoryPerTable*. The default value is 1024 KB.
>>Current default value is too small, so it would be a good tip for
>>developers to know and tune this property. It would be great if Derby
>>can configure this property value based on factors like max heap size,
>>size of data cache and/or other parameters.
>>    
>>
>
>  
>

Re: [jira] Commented: (DERBY-1397) Tuning Guide: Puzzling optimizer documentation

Reply via email to