> Rowids will be faster than primary keys.

My primary keys are ROWIDs ("INTEGER PRIMARY KEY" actually)

None of the index was exploited for the order by, and the matched records
in table A are scattered in pages all over the database, so ordering them
in memory has a tendency to "replace" the whole SQLite cache: first time a
query is run, it's slow, second time, it's fast, but if you change the
condition value (?1) then it's slow again as the page cache is invalidated
(it is very visible in the resource monitor, with a disk access spike)

> You might be able to make the new table a WITHOUT ROWID table and set its
PRIMARY KEY up with the same (or a superset of the) fields of your "fairly
large index" in order to save a bit of space.

I have been experimenting that way, and actually since A1 and B1 should fit
32bits integers for the foreseeable future, combining them into a 64bit
integer is possible, and I use (A1 << 32) | B1  as "INTEGER PRIMARY KEY"
(ROWID). This makes a separate composite index unnecessary as the primary
key becomes the composite index: the equality condition becomes a range
condition on the rowid, with an order by on the rowid, both being fast and
cache-friendly.

It reduces disk usage significantly over the previous full-blown C table +
composite index, it is still a sort of manually-managed hacky index, which
involves extra queries to maintain it. But at the moment it seems to be the
"better" solution.

> It might be possible to write a virtual table module that does the same
> as your index on C, but with C being a view.

I had a look that way, but AFAICT virtual tables do not support indexing,
so I would have to index manually.


On Tue, Mar 3, 2015 at 4:57 PM, Dan Kennedy <danielk1977 at gmail.com> wrote:

> On 03/03/2015 06:10 PM, Eric Grange wrote:
>
>> Hi,
>>
>> I have problem where I need a "multi-table index" ersatz, or maybe a
>> better
>> data structure :-)
>>
>> The problem is as follow:
>>
>>     - Table A : some fields plus fields A1 & A2
>>     - Table B : some fields plus fields B1 & B2
>>
>> Both tables have several dozen millions of rows, and both are accessed
>> independently of each others by some queries, their current structure has
>> no performance issues for those queries.
>>
>> However I have a new query which is like
>>
>> select ...some fields of A & B...
>> from A join B on A.A2 = B.B2
>> where A.A1 = ?1
>> order by B.B1
>> limit 100
>>
>>
>> Without the limit, there can be tens of thousandths resulting rows,
>> without
>> the A1 condition, there can be millions of resulting rows.
>>
>> With indexes on A & B, the performance of the above is not very good, as
>> indexing A1 is not enough, and indexing B1 is not enough either, so no
>> query plan is satisfying.
>>
>> I can make the query instantaneous by duplicating the A1 & B1 fields in a
>> dedicated C table (along with the primary keys of A & B), index that
>> table,
>> and then join back the A & B table to get the other fields.
>>
>> However this results in a fairly large table of duplicated data, whose
>> sole
>> purpose is to allow the creation of a fairly large index, which gets me
>> the
>> performance.
>>
>
> You might be able to make the new table a WITHOUT ROWID table and set its
> PRIMARY KEY up with the same (or a superset of the) fields of your "fairly
> large index" in order to save a bit of space.
>
>
>
>
>
>> Note that if the fields A1 & B1 are removed from their tables and kept
>> only
>> in C, this has massive performance implication on other queries running
>> only against A & B, as those fields are leveraged in other composite
>> indexes.
>>
>> Is there a better way that would not involve duplicating the data?
>>
>> Eric
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

Reply via email to