[
https://issues.apache.org/jira/browse/DERBY-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mamta A. Satoor updated DERBY-6317:
-----------------------------------
Attachment: testRepro_v1.txt
DERBY_6317_temp_changes_for_debugging.txt
I had to make a change to the earlier test program since that program was
inserting duplicate values in non-unique foreign key on Table2. It now inserts
all unique values in foreign key corresponding to Table3 and I have been able
to reproduce the problem. I have attached the repro test case as
testRepro_v1.txt. The program creates threee tables as shown below
CREATE TABLE Table1 (ID int PRIMARY KEY NOT NULL);
CREATE TABLE Table2 (Table1_ID int NOT NULL, Table3_ID int NOT NULL,
CONSTRAINT TABLE2_PK PRIMARY KEY (Table1_ID,Table3_ID));
CREATE TABLE Table3 (ID int PRIMARY KEY NOT NULL);
ALTER TABLE table2 ADD CONSTRAINT TABLE2_FK_1 FOREIGN KEY (Table1_ID)
REFERENCES TABLE1(ID);
ALTER TABLE table2 ADD CONSTRAINT TABLE2_FK_2 FOREIGN KEY (Table3_ID)
REFERENCES TABLE3(ID);
The program inserts 1million rows in Table1 and 8 million rows in both Table2
and Table3. After the data is loaded, it selects each of the 8 million rows
from Table2 individually with a query like following(the constant assigned to
t0.Table3_ID iterates from 0 to 8million-1).
SELECT count(*) FROM
Table1 T1,
Table2 t0
WHERE t1.ID = t0.Table1_ID and
t0.Table3_ID = 0;
Additionally, I have two printlns in BTreeCostController.getScanCost towards
the bottom of the method. I have attached svn diff for changes in
BTreeCostController.getScanCost as DERBY_6317_temp_changes_for_debugging.txt.
The method will also have the changes suggested by Mike in his second patch but
those changes are commented out.
.....
//Added this println and if statement with println. The rest of
the code is unchanged
System.out.println("In engine estimated row count is " +
estimated_row_count);
if (estimated_row_count < 0.5)
System.out.println("We are going to round estimated row
count to 0");
// RESOLVE - should we make sure this number is > 0?
cost_result.setEstimatedRowCount(Math.round(estimated_row_count));
}
finally
Run the test program after making the changes in
BTreeCostController.getScanCost as follows.
java org.apache.derbyTesting.functionTests.tests.lang.MamtaJDBC > dellater.txt
It will produce a really large dellater.txt. notepad couldn't open the file
because of it's size. I did grep on the file to look for string "We are going
to round estimated row count to 0" and found number of occurences of it. An
example of the string were as follows
./dellater2.txt:51892862:We are going to round estimated row count to 0
./dellater2.txt:51892866:We are going to round estimated row count to 0
./dellater2.txt:51892904:We are going to round estimated row count to 0
./dellater2.txt:51892908:We are going to round estimated row count to 0
./dellater2.txt:51892946:We are going to round estimated row count to 0
./dellater2.txt:51892950:We are going to round estimated row count to 0
./dellater2.txt:51892988:We are going to round estimated row count to 0
./dellater2.txt:51892992:We are going to round estimated row count to 0
./dellater2.txt:51893030:We are going to round estimated row count to 0
./dellater2.txt:51893034:We are going to round estimated row count to 0
./dellater2.txt:51893072:We are going to round estimated row count to 0
.....
I picked the first line from grep output above and used that line number to
find the value of the constant in the SELECT query that resulted in estimated
row count to be rounded to 0 and found following
$ head -n 51892862 dellater2.txt | tail -n 10
In engine estimated row count is 8000005.0
In engine estimated row count is 0.9536743
In engine estimated row count is 1000005.0
t0.Table3_ID =5189284
In engine estimated row count is 1000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.47683716
We are going to round estimated row count to 0
Next, I tried that t0.Table3_ID =5189284 value directly in ij with log query
plan on and could see in derby.log that we are doing table scan for the
inner(right) resultset. I tried value one higher than the problem ID value and
saw that we use index scan for that SELECT query.
$ java -Dderby.language.logQueryPlan=true -Dij.exceptionTrace=true
org.apache.derby.tools.ij
ij version 10.11
ij> connect 'jdbc:derby:clobtest3';
ij> SELECT count(*) FROM
Table1 T1,
Table2 t0
WHERE t1.ID = t0.Table> 1_ID and
t0.Tabl> e3_ID = 5189284 ;
In engine estimated row count is 1000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.47683716
We are going to round estimated row count to 0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.47683716
We are going to round estimated row count to 0
In engine estimated row count is 1000005.0
1
-----------
1
1 row selected
ij> SELECT count(*) FROM
Table1 T1,
Table2 t0
WHERE t1.ID = t0.Table1_ID and
t0.Table3_ID = 5189285;
In engine estimated row count is 1000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.9536743
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.9536743
In engine estimated row count is 1000005.0
1
-----------
1
1 row selected
I applied Mike's 2nd patch which check if estimated row count is less than <.5
and if so, it assigns value 1 to estimated row count. With that patch, I ran
the 2 queries again in ij against the database and saw that we are indeed using
index scan for both the queries. So, the patch definitely fixed the problem
with the repro db I have created.
The test program takes over 4-5hours to load the data and run 8millions selects.
> Optmizer can choose the wrong path when BTreeCostController.java returns an
> estimate cost and row count of 0.0
> --------------------------------------------------------------------------------------------------------------
>
> Key: DERBY-6317
> URL: https://issues.apache.org/jira/browse/DERBY-6317
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.8.2.2
> Environment: Derby 10.8.2.2 on Oracle Solaris 10
> Reporter: Brett Bergquist
> Assignee: Mike Matrigali
> Attachments: derby6317_2.diff, derby6317.diff,
> DERBY_6317_temp_changes_for_debugging.txt, testRepro_v1.txt
>
>
> The optimizer can chose the wrong path when BTreeCostController.java returns
> an estimate cost and row count of 0.0.
> Assume that you have two tables that are being joined like:
> SELECT * FROM T1, T0
> WHERE T1.ID = T0.F_ID and
> T0.ID = 3;
> Also assume that T0 has two columns, ID and F_ID and F_ID is a foreign key on
> T1.ID. Assume that T1.ID is the primary key of T1 and (T0.F_ID, T0.ID) is
> the primary key on T0. Assume that there is a non-unique index on T0.ID.
> The correct query plan for this should be to query T0 using the non-unique
> index on T0.ID and then use the foreign key value in those rows to do query
> T1 using the primary key on T1.
> With some values of T0.ID in the above query this query plan is chosen and
> works. With other values of T0.ID , the query plan does an query on T0 using
> the non-unique index on T0.ID and then does a table scan on T1.
> For example, in my case the query:
> SELECT * FROM T1, T0
> WHERE T1.ID = T0.F_ID and
> T0.ID = 22112129;
> has this query plan.
> The problem appears to be in BTreeCostController.java. When this returns the
> same value for the "left_of_start" and the "left_of_stop" (which is being
> used to estimate the number of rows and cost), then the estimate cost and row
> count becomes 0.0. When this is used in the join order of T0, T1, then the
> cost of the table scan for T1 becomes 0.0 as well.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira