[jira] [Updated] (DERBY-6045) in list multi-probe by primary key not chosen on tables with >256 rows

Mamta A. Satoor (JIRA) Mon, 11 Mar 2013 15:35:14 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mamta A. Satoor updated DERBY-6045:
-----------------------------------

    Attachment: DERBY6045_patch1_diff.txt

FromBaseTable.java has following code to change the selectivity if statistics 
exist for a conglomerate but since we do not collect statistics for 
single-column unique index anymore, following if condition will never be true 
for single-column unique index and hence we will not consider that index even 
if it is a faster plan)
    if (seenFirstColumn && statisticsForConglomerate &&
          (startStopPredCount > 0))
    {
           statStartStopSelectivity =
                 tableDescriptor.selectivityForConglomerate(cd, 
startStopPredCount);
      }

In my attached patch, I am changing the code above to as follows

 if (seenFirstColumn && (startStopPredCount > 0))
 {
    if (statisticsForConglomerate)
       statStartStopSelectivity =
          tableDescriptor.selectivityForConglomerate(cd,
              startStopPredCount);
    else {
       if (cd.isIndex())  {
          //DERBY-3790 (Investigate if request for update
          // statistics can be skipped for certain kind of
          // indexes, one instance may be unique indexes based
          // on one column.) But as found in DERBY-6045 (in list
          // multi-probe by primary key not chosen on tables with
          // >256 rows), even though we do not keep the
          // statistics for single-column unique indexes, we
          // should improve the selectivity of such an index
          // when the index is being considered by the optimizer.
          IndexRowGenerator irg = null;
          irg = cd.getIndexDescriptor();
          if (irg.isUnique()
             && irg.numberOfOrderedColumns() == 1
             && startStopPredCount == 1)
                statStartStopSelectivity =
                   (double)(1/(double)baseRowCount());
       }
    }
 }

With the new code, if we come across a conglomerate with no available 
statistics, then we check if 
1)we are dealing with an index and
2)if it is single-column unique index and 
3)there is one start-stop predicate, then
go ahead and set the start-stop selectivity to 1/number of rows in the table.

I changed the InListMultiProbeTest.java by removing x in fron of the 2 fixtures 
xtestDerby6045 and xtestDerby6045InsertAllRowsAdditionalUniqueIndex and both 
those fixtures ran fine with my patch.

I will appreciate another pair of eyes on the code rework, especially, the 
check for "there is one start-stop predicate" in the if clause. The reason I 
think we need this check is based on the existing code that we have in 
TableDescriptor for indexes with statistics
public double selectivityForConglomerate(ConglomerateDescriptor cd, int 
numKeys) 
  throws StandardException
{
     UUID referenceUUID = cd.getUUID();
     List sdl = getStatistics();
     for (Iterator li = sdl.iterator(); li.hasNext(); )
     {
         StatisticsDescriptor statDesc = (StatisticsDescriptor) li.next();
         if (!referenceUUID.equals(statDesc.getReferenceID()))
            continue;
                        
         if (statDesc.getColumnCount() != numKeys)
            continue;
                        
         return statDesc.getStatistic().selectivity((Object[])null);
}

                
> in list multi-probe by primary key not chosen on tables with >256 rows
> ----------------------------------------------------------------------
>
>                 Key: DERBY-6045
>                 URL: https://issues.apache.org/jira/browse/DERBY-6045
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.9.1.0, 10.10.0.0
>         Environment: Linux Debian 6.0.5
>            Reporter: Tony Brusseau
>         Attachments: DERBY6045_patch1_diff.txt, 
> optimizerTraceOutputFor10_8.txt, optimizerTraceOutputForTrunk.txt, 
> optimizerTraceOutputWithUpdateStatsFor10_8.txt, 
> optimizerTraceOutputWithUpdateStatsForTrunk.txt
>
>
> I have a table with a long integer primary key field and 11 million rows. I 
> seem to be unable to load large chunks of rows via id in a reasonably 
> efficient manner.
>   1. If I do individual lookups via the primary key, then a fast indexed 
> lookup occurs. However, if I do large numbers of such queries, then the time 
> is overwhelmed by round-trip overhead which makes everything incredibly slow.
>   2. If I use a single query with a disjunction of the primary keys of 
> interest,  then a table scan is performed (even if the clause only contains 
> 1-3 items), which walks over 11 million rows...incredibly inefficient.
>   3. If I use an IN clause, then a table scan is performed (even if the 
> clause only contains 1-3 items), which walks over 11 million 
> rows...incredibly inefficient.
> I'm guessing that this might have something to do with the fact that I'm 
> using large integers and really big numbers that don't start anywhere at or 
> about 1 for my keys. Could this possibly be confusing the optimizer?
> Here are the unlimited query plans for the 3 cases that I enumerated:
> *********************************************************************************************
> [EL Fine]: 2013-01-17 
> 11:09:53.384--ServerSession(582235416)--Connection(1430986883)--Thread(Thread["Initial
>  Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, 
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
> KB.FORMULA_TERM WHERE (TERM_ID = ?)
>       bind => [2251799814033500]
> Thu Jan 17 11:09:53 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread 
> Group] (XID = 4711079), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, 
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
> KB.FORMULA_TERM WHERE (TERM_ID = ?) ******* Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 1
> Rows filtered = 0
> restriction = false
> projection = true
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count: 1.00
>       optimizer estimated cost: 6.59
> Source result set:
>       Index Row to Base Row ResultSet for FORMULA_TERM:
>       Number of opens = 1
>       Rows seen = 1
>       Columns accessed from heap = {1, 2, 3, 4, 5, 6, 7, 8}
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               optimizer estimated row count: 1.00
>               optimizer estimated cost: 6.59
>               Index Scan ResultSet for FORMULA_TERM using constraint 
> KB_FORMULA_TERM_TERM_ID_PK at read committed isolation level using share row 
> locking chosen by the optimizer
>               Number of opens = 1
>               Rows seen = 1
>               Rows filtered = 0
>               Fetch Size = 1
>                       constructor time (milliseconds) = 0
>                       open time (milliseconds) = 0
>                       next time (milliseconds) = 0
>                       close time (milliseconds) = 0
>                       next time in milliseconds/row = 0
>               scan information:
>                       Bit set of columns fetched=All
>                       Number of columns fetched=2
>                       Number of deleted rows visited=0
>                       Number of pages visited=3
>                       Number of rows qualified=1
>                       Number of rows visited=1
>                       Scan type=btree
>                       Tree height=-1
>                       start position:
>                               >= on first 1 column(s).
>                               Ordered null semantics on the following 
> columns: 
>                       stop position:
>                               > on first 1 column(s).
>                               Ordered null semantics on the following 
> columns: 
>                       qualifiers:
>                               None
>                       optimizer estimated row count: 1.00
>                       optimizer estimated cost: 6.59
> [EL Fine]: 2013-01-17 
> 11:01:00.732--ServerSession(1237006689)--Connection(927179828)--Thread(Thread["Initial
>  Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, 
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
> KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?))
>       bind => [2251799814033500, 2251799814033501, 2251799814033499]
> Thu Jan 17 11:01:10 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread 
> Group] (XID = 4711078), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, 
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
> KB.FORMULA_TERM WHERE (((TERM_ID = ?) OR (TERM_ID = ?)) OR (TERM_ID = ?)) 
> ******* Project-Restrict ResultSet (3):
> Number of opens = 1
> Rows seen = 3
> Rows filtered = 0
> restriction = false
> projection = true
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count: 1176730.30
>       optimizer estimated cost: 5931065.54
> Source result set:
>       Project-Restrict ResultSet (2):
>       Number of opens = 1
>       Rows seen = 11767298
>       Rows filtered = 11767295
>       restriction = true
>       projection = false
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               restriction time (milliseconds) = 0
>               projection time (milliseconds) = 0
>               optimizer estimated row count: 1176730.30
>               optimizer estimated cost: 5931065.54
>       Source result set:
>               Table Scan ResultSet for FORMULA_TERM at read committed 
> isolation level using instantaneous share row locking chosen by the optimizer
>               Number of opens = 1
>               Rows seen = 11767298
>               Rows filtered = 0
>               Fetch Size = 16
>                       constructor time (milliseconds) = 0
>                       open time (milliseconds) = 0
>                       next time (milliseconds) = 0
>                       close time (milliseconds) = 0
>                       next time in milliseconds/row = 0
>               scan information:
>                       Bit set of columns fetched=All
>                       Number of columns fetched=9
>                       Number of pages visited=34358
>                       Number of rows qualified=11767298
>                       Number of rows visited=11767298
>                       Scan type=heap
>                       start position:
>                               null
>                       stop position:
>                               null
>                       qualifiers:
>                               None
>                       optimizer estimated row count: 1176730.30
>                       optimizer estimated cost: 5931065.54
> [EL Fine]: 2013-01-17 
> 11:27:00.627--ServerSession(1237006689)--Connection(1688096771)--Thread(Thread["Initial
>  Lisp Listener",5,SubL Thread Group])--SELECT TERM_ID, ARG0, ARG1, ARG2, 
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
> KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?))
>       bind => [2251799814033500, 2251799814033501, 2251799814033499]
> Thu Jan 17 11:47:26 CST 2013 Thread["Initial Lisp Listener",5,SubL Thread 
> Group] (XID = 4711080), (SESSIONID = 3), SELECT TERM_ID, ARG0, ARG1, ARG2, 
> ARG3, FORMULA_HASH, FORMULA_LENGTH, FORMULA_TYPE, KB_STATUS FROM 
> KB.FORMULA_TERM WHERE (TERM_ID IN (?,?,?)) ******* Project-Restrict ResultSet 
> (3):
> Number of opens = 1
> Rows seen = 3
> Rows filtered = 0
> restriction = false
> projection = true
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count: 1176730.30
>       optimizer estimated cost: 5931065.54
> Source result set:
>       Project-Restrict ResultSet (2):
>       Number of opens = 1
>       Rows seen = 11767298
>       Rows filtered = 11767295
>       restriction = true
>       projection = false
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               restriction time (milliseconds) = 0
>               projection time (milliseconds) = 0
>               optimizer estimated row count: 1176730.30
>               optimizer estimated cost: 5931065.54
>       Source result set:
>               Table Scan ResultSet for FORMULA_TERM at read committed 
> isolation level using instantaneous share row locking chosen by the optimizer
>               Number of opens = 1
>               Rows seen = 11767298
>               Rows filtered = 0
>               Fetch Size = 16
>                       constructor time (milliseconds) = 0
>                       open time (milliseconds) = 0
>                       next time (milliseconds) = 0
>                       close time (milliseconds) = 0
>                       next time in milliseconds/row = 0
>               scan information:
>                       Bit set of columns fetched=All
>                       Number of columns fetched=9
>                       Number of pages visited=34358
>                       Number of rows qualified=11767298
>                       Number of rows visited=11767298
>                       Scan type=heap
>                       start position:
>                               null
>                       stop position:
>                               null
>                       qualifiers:
>                               None
>                       optimizer estimated row count: 1176730.30
>                       optimizer estimated cost: 5931065.54

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (DERBY-6045) in list multi-probe by primary key not chosen on tables with >256 rows

Reply via email to