Re: Poor query optimizer choices is making Derby unusable for large tables

Rick Hillegas Fri, 11 Nov 2005 14:23:59 -0800

Hi Kevin,

You might also try using a temporary table to split your scan up into aseries of optimzable queries. Regards-Rick


declare global temporary table session.accumulator
(
  ObjectId int NOT NULL,
  WordLocation int NOT NULL
)
not logged;

insert into session.accumulator
   SELECT ObjectId, WordLocation
   FROM tblSearchDictionary
   WHERE Word = 'CONTACT'
;
insert into session.accumulator
   SELECT ObjectId, WordLocation
   FROM tblSearchDictionary
   WHERE Word = 'ADD'
;

SELECT ObjectId, SUM(WordLocation) AS Score
FROM session.accumulator
GROUP BY ObjectId;


Mamta Satoor wrote:

Hi Kevin,

I haven't investigated Derby-47 to know how to fix the problem but Ido have an optimizer overrides patch waiting for review on the derbydeveloper list which will let user specify their own optimizerproperties to help the optimizer pick a specific plan. The JIRA entryfor optimizer overrides is Derby-573. The patch is for Derby 10.2.These optimizer properties can be handy in a case like yours whereuntil the optimizer is fixed, a user can tell the optimizer what planto use. Just an FYI as I know it doesn't help with your particularcase since there is no optimizer overrides support for 10.1.MamtaOn 11/11/05, *Kevin Hore* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:


    The Derby query optimizer (this is with 10.1.1.0
    <http://10.1.1.0>) decides to perform an
    expensive table scan in certain circumstances, when it could make
    use of
    available indexes. The resulting poor performance is rendering Derby
    useless for our application.

    I believe that this behaviour might be related to DERBY-47
    ( http://issues.apache.org/jira/browse/DERBY-47), which was opened in
    October 2004. However, this seems such a severe and fundamental
    problem
    that I'm surprised firstly that a significant portion of the Derby
    user
    base isn't affected by this and, secondly, that DERBY-47 hasn't been
    considered a critical defect to be fixed with the utmost urgency.

    I've described the problem in detail below, and I'd appreciate any
    assistance. Specifically:

    i) Does anyone have any plans to fix this problem?

    ii) In the meantime, are there any work-arounds? I'd appreciate any
    suggestions that would decrease the execution time of my second query
    below (the one with with two search terms). Likewise, any general
    strategies for avoiding this problem with IN clauses would be
    appreciated.


    ----PROBLEM DESCRIPTION----

    Consider the table:

    CREATE TABLE tblSearchDictionary
    (
    ObjectId int NOT NULL,
    ObjectType int NOT NULL,
    Word VARCHAR(64) NOT NULL,
    WordLocation int NOT NULL,
    CONSTRAINT CONSd0e222 UNIQUE (ObjectId,ObjectType,Word,WordLocation)
    );

    This table has an index on each of the four columns, it also has the
    unique index across all four columns as defined above:

    CREATE INDEX tblSearchDictionaryObjectId ON tblSearchDictionary
    (ObjectId);
    CREATE INDEX tblSearchDictionaryObjectType ON tblSearchDictionary
    (ObjectType);
    CREATE INDEX tblSearchDictionaryWord ON tblSearchDictionary (Word);
    CREATE INDEX tblSearchDictionaryWordLocation ON tblSearchDictionary
    (WordLocation);

    The table contains about 260,000 rows.

    The following query selects all rows that match instances of string in
    the Word column. It sums the WordLocation column having grouped by
    the
    ObjectId column.

    SELECT ObjectId, SUM(WordLocation) AS Score
    FROM tblSearchDictionary
    WHERE Word = 'CONTACT'
    GROUP BY ObjectId;

    On my machine this will usually complete in an acceptable time of
    around
    200ms.

    Now consider the following query which adds a second search term
    on the
    same column.

    SELECT ObjectId, SUM(WordLocation) AS Score
    FROM tblSearchDictionary
    WHERE Word = 'CONTACT' OR Word = 'ADD'
    GROUP BY ObjectId;

    This second query usually takes around 10000ms on my machine. My
    understanding from the Derby optimizer docs and DERBY-47 is that
    this is
    because Derby is re-writing the query along the following lines, and
    then choosing to do a table scan:

    SELECT ObjectId, SUM(WordLocation) AS Score
    FROM tblSearchDictionary
    WHERE
      Word IN ('CONTACT', 'ADD')
      AND Word >= 'ADD'
      AND Word <= 'CONTACT'
    GROUP BY ObjectId;

    The plan for the first query indicates that the
    tblSearchDictionaryWord
    index is used to perform an index scan. However, the plan for the
    second
    query indicates that the majority of the additional time is taken
    performing a table scan over the entire table, instead of making
    use of
    the indexes available. Our application uses IN quite frequently,
    so this
    optimizer behaviour would seem to present a significant problem.

    ---QUERY PLAN FOR FIRST QUERY----

    Statement Name:
        null
    Statement Text:
        SELECT
        ObjectId,
        SUM(WordLocation) AS Score
    FROM tblSearchDictionary
    WHERE
            Word = 'CONTACT'
    GROUP BY
        ObjectId

    Parse Time: 0
    Bind Time: 0
    Optimize Time: 16
    Generate Time: 0
    Compile Time: 16
    Execute Time: 0
    Begin Compilation Timestamp : 2005-11-11 12:28:52.765
    End Compilation Timestamp : 2005-11-11 12:28: 52.781
    Begin Execution Timestamp : 2005-11-11 13:08:15.406
    End Execution Timestamp : 2005-11-11 13:08:15.406
    Statement Execution Plan Text:
    Project-Restrict ResultSet (5):
    Number of opens = 1
    Rows seen = 93
    Rows filtered = 0
    restriction = false
    projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count:            1.00
        optimizer estimated cost:          226.00

    Source result set:
        Grouped Aggregate ResultSet:
        Number of opens = 1
        Rows input = 113
        Has distinct aggregate = false
        In sorted order = false
        Sort information:
            Number of rows input=113
            Number of rows output=93
            Sort type=internal
            constructor time (milliseconds) = 0
            open time (milliseconds) = 0
            next time (milliseconds) = 0
            close time (milliseconds) = 0
            optimizer estimated row count:             1.00
            optimizer estimated cost:          226.00

        Source result set:
            Project-Restrict ResultSet (4):
            Number of opens = 1
            Rows seen = 113
            Rows filtered = 0
            restriction = false
            projection = true
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 0
                close time (milliseconds) = 0
                restriction time (milliseconds) = 0
                projection time (milliseconds) = 0
                optimizer estimated row count:          118.00
                optimizer estimated cost:          226.00

            Source result set:
                Index Row to Base Row ResultSet for TBLSEARCHDICTIONARY:
                Number of opens = 1
                Rows seen = 113
                Columns accessed from heap = {0, 3}
                    constructor time (milliseconds) = 0
                    open time (milliseconds) = 0
                    next time (milliseconds) = 0
                    close time (milliseconds) = 0
                    optimizer estimated row count:           118.00
                    optimizer estimated cost:          226.00

                    Index Scan ResultSet for TBLSEARCHDICTIONARY using
    index
    TBLSEARCHDICTIONARYWORD at read committed isolation level using share
    row locking chosen by the optimizer
                    Number of opens = 1
                    Rows seen = 113
                    Rows filtered = 0
                    Fetch Size = 1
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 0
                        close time (milliseconds) = 0
                        next time in milliseconds/row = 0

                    scan information:
                        Bit set of columns fetched=All
                        Number of columns fetched=2
                        Number of deleted rows visited=0
                        Number of pages visited=4
                        Number of rows qualified=113
                        Number of rows visited=114
                        Scan type=btree
                        Tree height=3
                        start position:
        >= on first 1 column(s).
        Ordered null semantics on the following columns:
    0
                        stop position:
        > on first 1 column(s).
        Ordered null semantics on the following columns:
    0
                        qualifiers:
    None
                        optimizer estimated row count:          118.00
                        optimizer estimated cost:          226.00


    ---QUERY PLAN FOR SECOND QUERY----

    Statement Name:
        null
    Statement Text:
        SELECT
        ObjectId,
        SUM(WordLocation) AS Score
    FROM tblSearchDictionary
    WHERE
            Word = 'CONTACT' OR  Word = 'ADD'
    GROUP BY
        ObjectId

    Parse Time: 0
    Bind Time: 0
    Optimize Time: 0
    Generate Time: 15
    Compile Time: 15
    Execute Time: 4250
    Begin Compilation Timestamp : 2005-11-11 13:16:17.578
    End Compilation Timestamp : 2005-11-11 13:16: 17.593
    Begin Execution Timestamp : 2005-11-11 13:16:17.593
    End Execution Timestamp : 2005-11-11 13:16:27.437
    Statement Execution Plan Text:
    Project-Restrict ResultSet (5):
    Number of opens = 1
    Rows seen = 100
    Rows filtered = 0
    restriction = false
    projection = true
        constructor time (milliseconds) = 0
        open time (milliseconds) = 4250
        next time (milliseconds) = 0
        close time (milliseconds) = 0
        restriction time (milliseconds) = 0
        projection time (milliseconds) = 0
        optimizer estimated row count:            1.00
        optimizer estimated cost:        82959.49

    Source result set:
        Grouped Aggregate ResultSet:
        Number of opens = 1
        Rows input = 712
        Has distinct aggregate = false
        In sorted order = false
        Sort information:
            Number of rows input=712
            Number of rows output=593
            Sort type=internal
            constructor time (milliseconds) = 0
            open time (milliseconds) = 4250
            next time (milliseconds) = 0
            close time (milliseconds) = 0
            optimizer estimated row count:             1.00
            optimizer estimated cost:        82959.49

        Source result set:
            Project-Restrict ResultSet (4):
            Number of opens = 1
            Rows seen = 712
            Rows filtered = 0
            restriction = false
            projection = true
                constructor time (milliseconds) = 0
                open time (milliseconds) = 0
                next time (milliseconds) = 4219
                close time (milliseconds) = 15
                restriction time (milliseconds) = 0
                projection time (milliseconds) = 0
                optimizer estimated row count:        19200.45
                optimizer estimated cost:        82959.49

            Source result set:
                Project-Restrict ResultSet (3):
                Number of opens = 1
                Rows seen = 40806
                Rows filtered = 40094
                restriction = true
                projection = false
                    constructor time (milliseconds) = 0
                    open time (milliseconds) = 0
                    next time (milliseconds) = 4219
                    close time (milliseconds) = 15
                    restriction time (milliseconds) = 124
                    projection time (milliseconds) = 0
                    optimizer estimated row count:        19200.45
                    optimizer estimated cost:        82959.49

                Source result set:
                    Table Scan ResultSet for TBLSEARCHDICTIONARY at read
    committed
    isolation level using share row locking chosen by the optimizer
                    Number of opens = 1
                    Rows seen = 40806
                    Rows filtered = 0
                    Fetch Size = 1
                        constructor time (milliseconds) = 0
                        open time (milliseconds) = 0
                        next time (milliseconds) = 4001
                        close time (milliseconds) = 15
                        next time in milliseconds/row = 0

                    scan information:
                        Bit set of columns fetched={0, 2, 3}
                        Number of columns fetched=3
                        Number of pages visited=2978
                        Number of rows qualified=40806
                        Number of rows visited=256001
                        Scan type=heap
                        start position:
    null                    stop position:
    null                    qualifiers:
    Column[0][0] Id: 2
    Operator: <
    Ordered nulls: false
    Unknown return value: true
    Negate comparison result: true
    Column[0][1] Id: 2
    Operator: <=
    Ordered nulls: false
    Unknown return value: false
    Negate comparison result: false

                        optimizer estimated row count:        19200.45
                        optimizer estimated cost:         82959.49

    ----------

    Thanks in advance for any help!

    Kind regards,


    Kevin Hore

Re: Poor query optimizer choices is making Derby unusable for large tables

Reply via email to