Re: [sqlite] index for a group by

John Crenshaw Tue, 20 Oct 2009 23:55:43 -0700

Actually, I thought exactly what you said when I saw the question. When
I saw your answer though I realized I'd been wrong, there are ways I
could slow indexing down, and therefore, ways to speed it up.


Splitting across transactions is about the insertion of data, not the
creation of the index. This is for the case where you can't insert first
and create the index later (maybe you're inserting a lot of data into a
table that already has data for example.) The recommendation in this
case is to wrap the whole batch of inserts in a transaction, but to
commit the transaction at regular intervals, breaking the process into
multiple pieces so that you don't spill over the memory cache.
SUPPOSEDLY this positively impacts indexing performance, but I've not
personally tested that claim. The more significant impact in this case
is actually the individual transactions you avoid, which makes a huge
difference.

John

-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
Sent: Tuesday, October 20, 2009 2:47 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] index for a group by

I want to notice, John, that my words are in context "I have table
with a lot of data, I want to create a particular index on it, how can
I do it quickly". In this context only your 5 bullet is applicable, I
admit I've forgot about that. And I don't understand how can one split
creating of index across several transactions.

Pavel

On Tue, Oct 20, 2009 at 2:15 PM, John Crenshaw
<johncrens...@priacta.com> wrote:
>> Nothing in this process can be sped up.
>
> Actually, that isn't entirely true. While it always requires a full
data
> scan, Some things can make the indexing part of the process faster.
> Since indexing is done basically using a comparative sort, anything
that
> would speed up the sort, will speed up the indexing.
> 1. It is faster to sort 1000 data points, than to insert 1000
datapoints
> into a constantly sorted list. Creating the index after all inserts is
> faster than creating the index, then inserting.
> 2. If possible, avoid indexes on long data strings, since the compares
> can be time consuming.
> 3. If you have a field that stores one of several strings (as an
"enum")
> consider using integers instead. Integers have lower overhead, and can
> be compared (and sorted) more quickly than strings.
> 4. If you are feeling really gutsy, you could mod the code and
implement
> a radix sort or something similar for integer values. I'm not really
> recommending this, just saying, inserts and lookups in a radix index
are
> faster than a btree.
> 5. Make sure the memory cache is large enough for the sort. Writing
data
> to disk is very costly, compared to sorting in memory. Default is 2000
> pages (2MB) worth of btree data. If you are about to build an index
that
> will require more btree than that, increase the size, or split across
> several transactions.
>
> John
>
> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
> Sent: Tuesday, October 20, 2009 7:35 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] index for a group by
>
>> please could you let me know which index could be better or faster?
>
> For this particular query index on (t,a,b,c) or even on (t,a,b,c,d)
> would be better and cause the query to execute faster (of course if by
> conditions t>x1 and t<x2 you don't select almost all rows in the
> table).
>
>> also do you know by chance how to speed up the index creation?
>
> There's no way to do that. SQLite have to scan the whole table, read
> data from all rows and put necessary information into the index.
> Nothing in this process can be sped up.
>
> Pavel
>
> On Mon, Oct 19, 2009 at 5:50 PM, Sylvain Pointeau
> <sylvain.point...@gmail.com> wrote:
>> hello,
>> I have a table T (a,b,c,d,t)
>> where c is a value
>> a,b,c some dimensions
>> and t the time
>>
>> I need to make a subset with a "group by"
>> like
>>
>> select a,b,c,sum(d)
>> from T
>> where t>x1 and t<x2
>> group by a,b,c
>>
>> I created an index on a,b,c
>> but this table is large and the index creation is time consuming (few
> hours)
>>
>> please could you let me know which index could be better or faster?
>> also do you know by chance how to speed up the index creation?
>>
>> Best regards,
>> Sylvain
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] index for a group by

Reply via email to