Tony Brusseau created DERBY-6045:
------------------------------------

             Summary: Can't bulk load terms by primary key efficiently
                 Key: DERBY-6045
                 URL: https://issues.apache.org/jira/browse/DERBY-6045
             Project: Derby
          Issue Type: Bug
          Components: Store
    Affects Versions: 10.9.1.0
         Environment: Linux Debian 6.0.5
            Reporter: Tony Brusseau
            Priority: Critical


I have a table with a long integer primary key field and 11 million rows. I 
seem to be unable to load large chunks of rows via id in a reasonably efficient 
manner.

  1. If I do individual lookups via the primary key, then a fast indexed lookup 
occurs. However, if I do large numbers of such queries, then the time is 
overwhelmed by round-trip overhead which makes everything incredibly slow.

  2. If I use a single query with a disjunction of the primary keys of 
interest,  then a table scan is performed (even if the clause only contest 1-3 
items), which walks over 11 million rows...incredibly inefficient.

  3. If I use an IN clause, then a table scan is performed (even if the clause 
only contest 1-3 items), which walks over 11 million rows...incredibly 
inefficient.

Here are the unlimited query plans for the 3 cases that I enumerated:
*********************************************************************************************

[EL Fine]: 2013-01-17 
11:09:53.384--ServerSession(582235416)--Connection(1430986883)--Thread(Thread["Initial
 Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, 
FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM 
WHERE (TERM_ID = ?)
        bind => [2251799814033500]

Thu Jan 17 11:09:53 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread 
Group] (XID = 4711079), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, 
ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
KB.FORMULA_TERM WHERE (TERM_ID = ?) ******* Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 1
Rows filtered = 0
restriction = false
projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count: 1.00
        optimizer estimated cost: 6.59
Source result set:
        Index Row to Base Row ResultSet for FORMULA_TERM:
        Number of opens = 1
        Rows seen = 1
        Columns accessed from heap = {1, 2, 3, 4, 5, 6, 7, 8}
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                optimizer estimated row count: 1.00
                optimizer estimated cost: 6.59
                Index Scan ResultSet for FORMULA_TERM using constraint 
KB_FORMULA_TERM_TERM_ID_PK at read committed isolation level using share row 
locking chosen by the optimizer
                Number of opens = 1
                Rows seen = 1
                Rows filtered = 0
                Fetch Size = 1
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        next time in milliseconds/row = 0

                scan information:
                        Bit set of columns fetched=All
                        Number of columns fetched=2
                        Number of deleted rows visited=0
                        Number of pages visited=3
                        Number of rows qualified=1
                        Number of rows visited=1
                        Scan type=btree
                        Tree height=-1
                        start position:
                                >= on first 1 column(s).
                                Ordered null semantics on the following 
columns: 
                        stop position:
                                > on first 1 column(s).
                                Ordered null semantics on the following 
columns: 
                        qualifiers:
                                None
                        optimizer estimated row count: 1.00
                        optimizer estimated cost: 6.59

[EL Fine]: 2013-01-17 
11:01:00.732--ServerSession(1237006689)--Connection(927179828)--Thread(Thread["Initial
 Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, 
FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM 
WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?))
        bind => [2251799814033500, 2251799814033501, 2251799814033499]

Thu Jan 17 11:01:10 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread 
Group] (XID = 4711078), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, 
ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?)) 
******* Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 3
Rows filtered = 0
restriction = false
projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count: 1176730.30
        optimizer estimated cost: 5931065.54
Source result set:
        Project-Restrict ResultSet (2):
        Number of opens = 1
        Rows seen = 11767298
        Rows filtered = 11767295
        restriction = true
        projection = false
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                restriction time (milliseconds) = 0
                projection time (milliseconds) = 0
                optimizer estimated row count: 1176730.30
                optimizer estimated cost: 5931065.54
        Source result set:
                Table Scan ResultSet for FORMULA_TERM at read committed 
isolation level using instantaneous share row locking chosen by the optimizer
                Number of opens = 1
                Rows seen = 11767298
                Rows filtered = 0
                Fetch Size = 16
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        next time in milliseconds/row = 0

                scan information:
                        Bit set of columns fetched=All
                        Number of columns fetched=9
                        Number of pages visited=34358
                        Number of rows qualified=11767298
                        Number of rows visited=11767298
                        Scan type=heap
                        start position:
                                null
                        stop position:
                                null
                        qualifiers:
                                None
                        optimizer estimated row count: 1176730.30
                        optimizer estimated cost: 5931065.54

[EL Fine]: 2013-01-17 
11:27:00.627--ServerSession(1237006689)--Connection(1688096771)--Thread(Thread["Initial
 Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, ARG3, 
FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM KB.FORMULA_TERM 
WHERE (TERM_ID IN (?,?,?))
        bind => [2251799814033500, 2251799814033501, 2251799814033499]

Thu Jan 17 11:47:26 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread 
Group] (XID = 4711080), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, 
ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?)) ******* Project-Restrict ResultSet 
(3):
Number of opens = 1
Rows seen = 3
Rows filtered = 0
restriction = false
projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count: 1176730.30
        optimizer estimated cost: 5931065.54
Source result set:
        Project-Restrict ResultSet (2):
        Number of opens = 1
        Rows seen = 11767298
        Rows filtered = 11767295
        restriction = true
        projection = false
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                restriction time (milliseconds) = 0
                projection time (milliseconds) = 0
                optimizer estimated row count: 1176730.30
                optimizer estimated cost: 5931065.54
        Source result set:
                Table Scan ResultSet for FORMULA_TERM at read committed 
isolation level using instantaneous share row locking chosen by the optimizer
                Number of opens = 1
                Rows seen = 11767298
                Rows filtered = 0
                Fetch Size = 16
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        next time in milliseconds/row = 0

                scan information:
                        Bit set of columns fetched=All
                        Number of columns fetched=9
                        Number of pages visited=34358
                        Number of rows qualified=11767298
                        Number of rows visited=11767298
                        Scan type=heap
                        start position:
                                null
                        stop position:
                                null
                        qualifiers:
                                None
                        optimizer estimated row count: 1176730.30
                        optimizer estimated cost: 5931065.54


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to