[sqlite] Optimising query with aggregate in subselect.

Andy Bennett Wed, 20 Nov 2019 08:50:13 -0800

Hi,

I'm trying to implement a "streaming" version of the classic "select thelatest version of a record" query.

By "streaming" I mean a query that executes by streaming what it needs outof tables and indexes as it needs it rather than using temporary b-trees ormaterializing anything up front.

I'm looking for a query that I can run and then just consume as manyresults as I want without worrying about the size of the entire result set.



Here's the schema I'm working with:

-----

CREATE TABLE "entrys" ("log-id" INTEGER NOT NULL , "entry-number" INTEGERNOT NULL , "region" TEXT NOT NULL , "key" TEXT NOT NULL , "timestamp"INTEGER NOT NULL , PRIMARY KEY ("log-id", "entry-number"))

CREATE UNIQUE INDEX "entrys-log-id-region-key-entry-number" ON "entrys" ("log-id" ASC, "region" ASC, "key" ASC, "entry-number" ASC)

-----

There's only a couple of million rows in "entrys" and my query times areinto 2 or 3 seconds of startup time before the first row is returned.




Here's my query:

-----
-- explain query plan
SELECT
"entrys"."log-id"       AS "log-id",
"entrys"."entry-number" AS "entry-number",
"entrys"."region"       AS "region",
"entrys"."key"          AS "key",
"entrys"."timestamp"    AS "timestamp"

FROM
        (SELECT
        MAX("entry-number") AS "entry-number",
        "key"
        FROM "entrys"
        WHERE
        "log-id" = 1 AND
        "region" = "user" AND
        "entry-number" <= 1700108
        AND key > "G"
        GROUP BY "key"
        ORDER BY "key" DESC
        limit 20 -- (1)
        ) AS "specific-entrys"

INNER JOIN "entrys"
ON
1 = "entrys"."log-id" AND
"specific-entrys"."key" = "entrys"."key" AND
"user" = "entrys"."region" AND
"specific-entrys"."entry-number" = "entrys"."entry-number"
AND "entrys"."key" > "G"

WHERE
"entrys"."log-id" = 1

ORDER BY "key" ASC
;
-----

...which has this query plan in SQLite verson 3.31.0

-----
QUERY PLAN
|--MATERIALIZE 1

| `--SEARCH TABLE entrys USING COVERING INDEXentrys-log-id-region-key-entry-number (log-id=? AND region=? AND key<?)|--SEARCH TABLE entrys USING INDEX entrys-log-id-region-key-entry-number(log-id=? AND region=? AND key<?)`--SEARCH SUBQUERY 1 AS specific-entrys USING AUTOMATIC COVERING INDEX(key=?)

-----


My problem is with the MATERIALIZE.

The query produces just shy of 2 million rows.

It takes several seconds to start up but is then pretty quick when fetchingeach row.

What I want to do is get rid of the startup costs so that I can paginate itefficiently.

If I run the subselect on its own then there is no startup cost. Theresults just get streamed straight out of the index.

Its query plan is

-----
QUERY PLAN

`--SEARCH TABLE entrys USING COVERING INDEXentrys-log-id-region-key-entry-number (log-id=? AND region=? AND key<?)

-----

Is there anything I can do to make the original version of the query streamthe results in this way?

The best I have come up with is to insert a LIMIT clause at the pointdenoted with "-- (1)". This keeps the subselect small and thenmaterialising the subselect and generating the automatic covering indexbecomes cheap.

For pagination I then feed in the key from the last row of the previousbatch at the points denoted with "-- (2)" and "-- (3)".

If I do this then it seems equally cheap to access batches at the start andend of the complete result set. The per-query cost is determined by thebatch size as set but the LIMIT clause.

However, I have been under the impression that LIMIT is supposed to be a"debugging" extension to the language and not recommended for use inqueries that end up in one's program. LIMIT also only hides the latency; itamortises it over each batch, but I still end up with memory requirementsin each "client" thread that are larger than I'd like; ideally I'd juststore the current row.




Thanks for any tips!




Best wishes,
@ndy

--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
0x7EBA75FF
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Optimising query with aggregate in subselect.

Reply via email to