[ https://issues.apache.org/jira/browse/DERBY-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464757#comment-13464757 ]
Bryan Pendleton commented on DERBY-2130: ---------------------------------------- I don't consider the patch high risk. I think it is ready to pursue. I think it's reasonable to mark this as a newcomer issue. I would be willing to help work with anybody who has the time/motivation to push this forward. > Optimizer performance slowdown from 10.1 to 10.2 > ------------------------------------------------ > > Key: DERBY-2130 > URL: https://issues.apache.org/jira/browse/DERBY-2130 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.1.4 > Reporter: Bryan Pendleton > Labels: derby_triage10_5_2 > Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, > plans.diff, repro.sql > > > Attached is 'repro.sql', an IJ script which demonstrates what I > believe to be a serious performance issue in the Optimizer. > I have run this script in a number of configurations: > - 10.1.2.1: the script runs successfully. The 'prepare' statement > takes about 90 seconds, on a fairly powerful Windows machine > - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777 > - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement > often takes about 220 seconds, on the same Windows machine > Intermittently, on 10.2 and on the trunk, the prepare statement takes > 15+ minutes. I cannot reliably reproduce this; I run the same script > several times in a row and I cannot predict whether it will take 220 > seconds or whether it will take 15+ minutes. > I am quite motivated to work on this problem, as this is blocking me from > using Derby for a project that I'm quite keen on, but I need some > suggestions and ideas about how to attack it. From my perspective > there are 3 primary topics: > 1) Why did optimizer performance for this query degrade so significantly > from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower, > for this particular query at least, in 10.2. Sometimes it is 10x slower. > 2) What is the source of the non-determinism? Why does the optimizer > often take 4 minutes to optimize this query on the trunk, but sometimes > take 15+ minutes? I don't believe that I'm changing anything from > run to run. > 3) Can we improve the optimizer performance even beyond what it was > for 10.1.2? I realize that this is an ugly query, but I was hoping to > see an optimization time of 5-10 seconds, not 90 seconds (and certainly > not 220 seconds). > I have attempted to start answering some of these questions, with > limited success. Here is some of what I think I've discovered so far: > - the optimizer changes in 10.2 seem to have given the optimizer many > more choices of possible query plans to consider. I think this means > that, if the optimizer does not time out, it will spend substantially > more time optimizing because there are more choices to evaluate. Does > this by itself mean that the optimizer will take 2.5 times longer in > 10.2 than it did in 10.1? > - something about this query seems to make the costing mechanism go > haywire, and produce extreme costs. While stepping through the > optimization of this query in the debugger I have seen it compute > costs like 1e63 and 1e200. This might be very closely related to > DERBY-1905, although I don't think I'm doing any subqueries here. > But maybe I'm misunderstanding the term "subquery" in DERBY-1905. > At any rate, due to the enormous estimated costs, timeout does not > occur. > - the WHERE clause in this query is converted during compilation to > an equivalent IN clause, I believe, which then causes me to run into > a number of the problems described in DERBY-47 and DERBY-713. > Specifically, rather than constructing a plan which involves 4 > index probes for the 4 WHERE clause values, the optimizer decides > that an index scan must be performed and that it will have to process > the entire index (because the query uses parameter markers, not > literal values). So perhaps solving DERBY-47 would help me > - the optimizer in fact comes up with a "decent" query plan quite quickly. > I have experimented with placing a hard limit into the optimizer > timeout code, so that I can force optimization to stop after an > arbitrary fixed period of time. Then I have been able to set that > value to as low as 1 second, and the optimizer has produced plans > that then execute in a few milliseconds. Of course, I have only tried > this with a trivial amount of data in my database, so it's possible > that the plan produced by the optimizer after just a second of > optimizing is in fact poor, and I'm just not noticing it because my > data sizes are so small. > At this point, what would be really helpful to me would be some suggestions > about some general approaches or techniques to try to start breaking down > and analyzing this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira