[
https://issues.apache.org/jira/browse/DERBY-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665612#comment-13665612
]
Tony Brusseau commented on DERBY-6045:
--------------------------------------
I tried the SELECT version of the DELETE statement:
SELECT * FROM kb.mt_gaf_top_level_term_counts WHERE (term = 1407374883554049)
OR (pred = 1407374883554049) OR (mt = 1407374883554049)
It looks like it is doing a table scan as well.
Number of pages visited=67
Number of rows qualified=418
Number of rows visited=29498
Note: this was tried without the Mamta's patches for this bug so far. However,
I am using table statics which would avoid this bug, so I'm pretty sure this is
still a real issue.
Here is the query plan:
Thu May 23 15:17:22 CDT 2013 Thread[DRDAConnThread_3,5,main] (XID = 80859),
(SESSIONID = 1), SELECT COUNT(*) FROM (SELECT * FROM
kb.mt_gaf_top_level_term_counts WHERE (term = 1407374883554049) OR (pred =
1407374883554049) OR (mt = 1407374883554049)) C2668 ******* Project-Restrict
ResultSet (4):
Number of opens = 1
Rows seen = 1
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1.00
optimizer estimated cost: 12365.71
Source result set:
Scalar Aggregate ResultSet:
Number of opens = 1
Rows input = 418
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 14751.50
optimizer estimated cost: 12365.71
Index Key Optimization = false
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 418
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 14751.50
optimizer estimated cost: 12365.71
Source result set:
Table Scan ResultSet for MT_GAF_TOP_LEVEL_TERM_COUNTS
at read committed isolation level using share row locking chosen by the
optimizer
Number of opens = 1
Rows seen = 418
Rows filtered = 0
Fetch Size = 1
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched=All
Number of columns fetched=6
Number of pages visited=67
Number of rows qualified=418
Number of rows visited=29498
Scan type=heap
start position:
null
stop position:
null
qualifiers:
Column[1][0] Id: 3
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
Column[1][1] Id: 2
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
Column[1][2] Id: 1
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
optimizer estimated row count: 14751.50
optimizer estimated cost: 12365.71
> in list multi-probe by primary key not chosen on tables with >256 rows
> ----------------------------------------------------------------------
>
> Key: DERBY-6045
> URL: https://issues.apache.org/jira/browse/DERBY-6045
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.9.1.0, 10.10.1.1
> Environment: Linux Debian 6.0.5
> Reporter: Tony Brusseau
> Assignee: Mamta A. Satoor
> Fix For: 10.8.3.1, 10.9.2.2, 10.10.1.2, 10.11.0.0
>
> Attachments: DERBY6045_patch1_diff.txt, DERBY6045_patch2_diff.txt,
> optimizerTraceOutputFor10_8.txt, optimizerTraceOutputForTrunk.txt,
> optimizerTraceOutputWithUpdateStatsFor10_8.txt,
> optimizerTraceOutputWithUpdateStatsForTrunk.txt, table.del
>
>
> I have a table with a long integer primary key field and 11 million rows. I
> seem to be unable to load large chunks of rows via id in a reasonably
> efficient manner.
> 1. If I do individual lookups via the primary key, then a fast indexed
> lookup occurs. However, if I do large numbers of such queries, then the time
> is overwhelmed by round-trip overhead which makes everything incredibly slow.
> 2. If I use a single query with a disjunction of the primary keys of
> interest, then a table scan is performed (even if the clause only contains
> 1-3 items), which walks over 11 million rows...incredibly inefficient.
> 3. If I use an IN clause, then a table scan is performed (even if the
> clause only contains 1-3 items), which walks over 11 million
> rows...incredibly inefficient.
> I'm guessing that this might have something to do with the fact that I'm
> using large integers and really big numbers that don't start anywhere at or
> about 1 for my keys. Could this possibly be confusing the optimizer?
> Here are the unlimited query plans for the 3 cases that I enumerated:
> *********************************************************************************************
> [EL Fine]: 2013-01-17
> 11:09:53.384--ServerSession(582235416)--Connection(1430986883)--Thread(Thread["Initial
> Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2,
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM
> KB.FORMULA_TERM WHERE (TERM_ID = ?)
> bind => [2251799814033500]
> Thu Jan 17 11:09:53 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread
> Group] (XID = 4711079), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2,
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM
> KB.FORMULA_TERM WHERE (TERM_ID = ?) ******* Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 1
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 6.59
> Source result set:
> Index Row to Base Row ResultSet for FORMULA_TERM:
> Number of opens = 1
> Rows seen = 1
> Columns accessed from heap = {1, 2, 3, 4, 5, 6, 7, 8}
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> optimizer estimated row count: 1.00
> optimizer estimated cost: 6.59
> Index Scan ResultSet for FORMULA_TERM using constraint
> KB_FORMULA_TERM_TERM_ID_PK at read committed isolation level using share row
> locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 1
> Rows filtered = 0
> Fetch Size = 1
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=2
> Number of deleted rows visited=0
> Number of pages visited=3
> Number of rows qualified=1
> Number of rows visited=1
> Scan type=btree
> Tree height=-1
> start position:
> >= on first 1 column(s).
> Ordered null semantics on the following
> columns:
> stop position:
> > on first 1 column(s).
> Ordered null semantics on the following
> columns:
> qualifiers:
> None
> optimizer estimated row count: 1.00
> optimizer estimated cost: 6.59
> [EL Fine]: 2013-01-17
> 11:01:00.732--ServerSession(1237006689)--Connection(927179828)--Thread(Thread["Initial
> Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2,
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM
> KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?))
> bind => [2251799814033500, 2251799814033501, 2251799814033499]
> Thu Jan 17 11:01:10 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread
> Group] (XID = 4711078), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2,
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM
> KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?))
> ******* Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 3
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1176730.30
> optimizer estimated cost: 5931065.54
> Source result set:
> Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 11767298
> Rows filtered = 11767295
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1176730.30
> optimizer estimated cost: 5931065.54
> Source result set:
> Table Scan ResultSet for FORMULA_TERM at read committed
> isolation level using instantaneous share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 11767298
> Rows filtered = 0
> Fetch Size = 16
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=9
> Number of pages visited=34358
> Number of rows qualified=11767298
> Number of rows visited=11767298
> Scan type=heap
> start position:
> null
> stop position:
> null
> qualifiers:
> None
> optimizer estimated row count: 1176730.30
> optimizer estimated cost: 5931065.54
> [EL Fine]: 2013-01-17
> 11:27:00.627--ServerSession(1237006689)--Connection(1688096771)--Thread(Thread["Initial
> Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2,
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM
> KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?))
> bind => [2251799814033500, 2251799814033501, 2251799814033499]
> Thu Jan 17 11:47:26 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread
> Group] (XID = 4711080), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2,
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM
> KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?)) ******* Project-Restrict ResultSet
> (3):
> Number of opens = 1
> Rows seen = 3
> Rows filtered = 0
> restriction = false
> projection = true
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1176730.30
> optimizer estimated cost: 5931065.54
> Source result set:
> Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 11767298
> Rows filtered = 11767295
> restriction = true
> projection = false
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> restriction time (milliseconds) = 0
> projection time (milliseconds) = 0
> optimizer estimated row count: 1176730.30
> optimizer estimated cost: 5931065.54
> Source result set:
> Table Scan ResultSet for FORMULA_TERM at read committed
> isolation level using instantaneous share row locking chosen by the optimizer
> Number of opens = 1
> Rows seen = 11767298
> Rows filtered = 0
> Fetch Size = 16
> constructor time (milliseconds) = 0
> open time (milliseconds) = 0
> next time (milliseconds) = 0
> close time (milliseconds) = 0
> next time in milliseconds/row = 0
> scan information:
> Bit set of columns fetched=All
> Number of columns fetched=9
> Number of pages visited=34358
> Number of rows qualified=11767298
> Number of rows visited=11767298
> Scan type=heap
> start position:
> null
> stop position:
> null
> qualifiers:
> None
> optimizer estimated row count: 1176730.30
> optimizer estimated cost: 5931065.54
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira