Re: [sqlite] Performance Issue on Large Table

2020-02-24 Thread Vikas Gupta
I guess is this slowness is because of triggers (most probably) amplified
by extra indexes.

Lets discuss indexes first:
if most of queries uses BibleID as SARG along with Book & Chapter then
INX_Verses_BID
& INX_Verses_BID_B_C are not required. These are unnecessary adding
slowness to write activities.
Keep minimal indexes.

*Indexes:*
CREATE INDEX `INX_Verses_BID` ON `Verses` (`BibleID`)
CREATE INDEX `INX_Verses_BID_B_C` ON `Verses` (`BibleID`,`Book`,`Chapter`)
CREATE INDEX `INX_Verses_BI_B_C_V` ON `Verses`
(`BibleID`,`Book`,`Chapter`,`VerseNumber`)
CREATE INDEX `INX_Verses_B_C_V` ON `Verses` (`Book`,`Chapter`,`VerseNumber`)


Now lest's discuss triggers, these I think are the prime bottleneck for
your write activity. In fact for such bulk activity Sybase-ASE

has solely separate command to "Disable Trigger". However this is not
present in Sqlite.

Triggers on table:

CREATE TRIGGER after_verses_delete AFTER DELETE ON Verses
BEGIN
  —the scripture table is the FTS5 table
DELETE FROM scriptures WHERE ID = old.ID;   —the scripture table is the
FTS5 table
END

CREATE TRIGGER after_verses_insert AFTER INSERT ON Verses
BEGIN
  —the scripture table is the FTS5 table
  INSERT INTO scriptures (ID,Scripture) VALUES(new.ID, new.Scripture);
END

CREATE TRIGGER after_verses_update UPDATE OF Scripture ON Verses
BEGIN
  —the scripture table is the FTS5 table
  UPDATE scriptures SET Scripture = new.Scripture WHERE ID = old.ID;
END



Thanks,
Vikas


On Mon, 24 Feb 2020 at 23:38, Olaf Schmidt  wrote:

> Am 23.02.2020 um 20:23 schrieb Richard Damon:
>
> >> An amount of 140 tables in such a "BibleVersions.db" is not
> >> uncommon and can be managed by SQLite in a good performance.
> >
> > I'm not sure that form of division would be good. One basic rule of
> > database normalization is that you don't break-up data based on the
> > value of one of the fields ...
>
> Sure, but FTS-"Tables" are in a "special category" (IMO)...
>
> At least, I'd separate them "by language", because it does
> not really make sense to me, to stuff e.g. the 31102 verses
> of a japanese Bible-version into the same FTS-index,
> where already one (or more) english versions reside.
>
> Between japanese and english that's obvious already at the
> "Unicode(Point)-level" - but even among "Latin-versions"
> (e.g. english and german ones) there might be different
> stemmer-algos to consider, to give optimal search-results.
>
> Olaf
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-24 Thread Olaf Schmidt

Am 23.02.2020 um 20:23 schrieb Richard Damon:


An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields ...


Sure, but FTS-"Tables" are in a "special category" (IMO)...

At least, I'd separate them "by language", because it does
not really make sense to me, to stuff e.g. the 31102 verses
of a japanese Bible-version into the same FTS-index,
where already one (or more) english versions reside.

Between japanese and english that's obvious already at the
"Unicode(Point)-level" - but even among "Latin-versions"
(e.g. english and german ones) there might be different
stemmer-algos to consider, to give optimal search-results.

Olaf

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Richard Damon

On 2/23/20 3:06 PM, R.Smith wrote:


On 2020/02/23 21:23, Richard Damon wrote:

On 2/23/20 8:31 AM, Olaf Schmidt wrote:


An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.




It is far less of a worse design than having an AFTER DELETE trigger 
doing trivial processing when you expect to delete 30K records at a 
time due to a "well-designed" 1NF schema.


Firstly, good design guides are only guides.
Secondly, having said that, either embrace the full idealism or none 
of it, but doing some things the good way and others not, hampers the 
"goodness" of the overall system.
Lastly, we usually use ON DELETE CASCADE Foreign Key relations to do 
what you are doing with the trigger, but of course you are using FTS5 
and I do not know if it can be done the FK way, or indeed how much the 
FK way will be better considering the same Indexes remain.



Note - I don't have a good solution here, if it was me I would 
immediately kill those triggers, see if it can be done smarter (or is 
needed at all), or even if it can be done in code - OR - get rid of 
the 1NF schema and go with the suggested Bible-per-Table method and 
keep the triggers (This will help nothing if the deletes are going to 
require deleting 1 record (and its trigger) each through 30K tables. 
It's only useful if it brings down the delete count).


If you cannot live without the triggers, nor can bring down the 
delete-count, maybe do it in code, or one quick way I can think of is 
using the ON DELETE to add those IDs to a "to_be_deleted" table and 
simply delete everything fropm the second table with "WHERE ID IN 
(SELECT ID FROM to_be_deleted)" before comitting the transaction, 
perhaps combined with dropping and re-making the Indexes - but of 
course this needs testing and might not be suitable for reasons I am 
unaware of in your use case.



Either way, you are probably in for some lengthy processing.

Best of luck!
Ryan 


My thought is that a modification (making multiple tables, one per 
version) that drops significant functionality (now version is hard coded 
as part of the query, and can't be determined as part of a query) is 
something not to lightly do.  It also isn't that likely to improve 
things a lot unless it allows the massive delete cycle to be replaced 
with a drop table.


One big question is why the apparent wholesale deletion of a version, 
and if really needed perhaps a better schema that supports the 
operations better.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Simon Slavin
On 23 Feb 2020, at 7:23pm, Richard Damon  wrote:

> I'm not sure that form of division would be good. One basic rule of database 
> normalization is that you don't break-up data based on the value of one of 
> the fields because you can't make the table to lookup data from be taken from 
> a field you get in a query.

People who study the bible often want to look up different versions of the same 
verse at once, so they can compare the translations and wording.  This suggests 
that splitting up the text by different versions would be a bad idea.

I would rather figure out the underlying problem, but I made a decision some 
time ago not to understand the FTS parts of SQLite (because I never needed it 
for my own work).  So I'm sorry I can't help on the performance issues.  The 
timings just seem too long for a SQLite database which isn't using FTS.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread R.Smith


On 2020/02/23 21:23, Richard Damon wrote:

On 2/23/20 8:31 AM, Olaf Schmidt wrote:


An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.




It is far less of a worse design than having an AFTER DELETE trigger 
doing trivial processing when you expect to delete 30K records at a time 
due to a "well-designed" 1NF schema.


Firstly, good design guides are only guides.
Secondly, having said that, either embrace the full idealism or none of 
it, but doing some things the good way and others not, hampers the 
"goodness" of the overall system.
Lastly, we usually use ON DELETE CASCADE Foreign Key relations to do 
what you are doing with the trigger, but of course you are using FTS5 
and I do not know if it can be done the FK way, or indeed how much the 
FK way will be better considering the same Indexes remain.



Note - I don't have a good solution here, if it was me I would 
immediately kill those triggers, see if it can be done smarter (or is 
needed at all), or even if it can be done in code - OR - get rid of the 
1NF schema and go with the suggested Bible-per-Table method and keep the 
triggers (This will help nothing if the deletes are going to require 
deleting 1 record (and its trigger) each through 30K tables. It's only 
useful if it brings down the delete count).


If you cannot live without the triggers, nor can bring down the 
delete-count, maybe do it in code, or one quick way I can think of is 
using the ON DELETE to add those IDs to a "to_be_deleted" table and 
simply delete everything fropm the second table with "WHERE ID IN 
(SELECT ID FROM to_be_deleted)" before comitting the transaction, 
perhaps combined with dropping and re-making the Indexes - but of course 
this needs testing and might not be suitable for reasons I am unaware of 
in your use case.



Either way, you are probably in for some lengthy processing.

Best of luck!
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Dominique Pellé
Chip Beaulieu  wrote:


> I also recently vacuumed the file. The issue really showed up after the FTS5 
> was
> setup on the table. I suspect it’s got something to do with the triggers more 
> than
> the indexes. I am definitely open to suggestions. I’ve  not been able to find 
> much
> info on the internet to optimize updates to tables with FTS.

If the performance issue is about updating the FTS index, then maybe
tweaking the 'automerge' feature can help. See:
https://www.sqlite.org/fts5.html#the_automerge_configuration_option

If you don't need NEAR or phrase queries, then perhaps setting
details=column can help speeding up re-indexing (at least
it reduces the index size). See:
https://www.sqlite.org/fts5.html#the_detail_option

Regards
Dominique
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Richard Damon

On 2/23/20 8:31 AM, Olaf Schmidt wrote:

Am 21.02.2020 um 02:24 schrieb Chip Beaulieu:
I have a table with 4.5 million records with full text indexing.  > 
Reads are very fast, but deleting / inserting / updating
takes on average about 50 seconds per record. I often do batches of 
30,000 deletes / inserts at a time. The last batch took 10 hours to 
complete.


These 30,000 deletes+inserts indicates that these were the
verse-records of a complete Bible-Content, right?

And your 4.5Mio records total indicate, that you have about
140 different Bible-versions in your DB?

I suspect it’s got something to do with the triggers more than the 
indexes.

Could be - but my guess is, that your primary DB-changes are caused
by "deleting or inserting verses of whole bible-contents".

So why not handle (hold) each bible in a separate table?

An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.


I'm not sure that form of division would be good. One basic rule of 
database normalization is that you don't break-up data based on the 
value of one of the fields because you can't make the table to lookup 
data from be taken from a field you get in a query.


--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-23 Thread Olaf Schmidt

Am 21.02.2020 um 02:24 schrieb Chip Beaulieu:
I have a table with 4.5 million records with full text indexing. 
 > Reads are very fast, but deleting / inserting / updating
takes on average about 50 seconds per record. 
I often do batches of 30,000 deletes / inserts at a time. 
The last batch took 10 hours to complete.


These 30,000 deletes+inserts indicates that these were the
verse-records of a complete Bible-Content, right?

And your 4.5Mio records total indicate, that you have about
140 different Bible-versions in your DB?


I suspect it’s got something to do with the triggers more than the indexes.

Could be - but my guess is, that your primary DB-changes are caused
by "deleting or inserting verses of whole bible-contents".

So why not handle (hold) each bible in a separate table?

An amount of 140 tables in such a "BibleVersions.db" is not
uncommon and can be managed by SQLite in a good performance.

Inserting "your next Bible-Version" into it would be fast,
because you're creating separate Tables for that job.
Same thing for "deleting Bible-versions" (just 2 "Drop Table"-calls).

You might argue, that the perfomance will suffer - especially
when you're trying to find "differences between two versions"...

But that is only a matter of a Join (via VerseID) on
the two Tables you want to "get a Verse-Diff for".

I've played that through with two different "Bibles":
- Bible12.txt and Bible13.txt (each one about 5MB unzipped)
- downloadable here: http://www.gutenberg.org/files/30/old/

The 31,102 "verse-records" each of the two text-files contains,
can be parsed and inserted into separate FTS-Tables in about:
- 0.5sec each (incl. text-parsing)

So, the total import-time for both (Bible12 and Bible13) was ~ 1sec.

That's far below your timings for "inserting a single verse-record".

FWIW, here is the SQLite-DB, my imports have produced (about 7MB)
http://vbRichClient.com/Downloads/BibleVersions.zip

It contains 4 Tables now (after importing the 2 text-files):
- Books_Bible12 (66 records) + FTS_Bible12 (31,102 records)
- Books_Bible13 (66 records) + FTS_Bible13 (31,102 records)

The schema-defs for those "Books_xxx" and "FTS_xxx" tables are:
"Create Table  (BookID Integer Primary Key, Title Text)"
"Create Virtual Table  Using FTS5(Scripture, tokenize=porter)"

No extra-indexes are needed... (IMO)

So, whilst the Books-Table was defined "quite normally" -
the FTS-table only contains a single Field ("Scripture")...
which raises the question, how to efficiently join 2 FTS-tables
(e.g. to find "differences in the Scripture-fields").

Well, in the recent Bible-TextFiles (from project Gutenberg),
theres already a "verse- or scripture-id" which is unique:
01:001:001 (two digits for BookNr, 3 for ChapterNr and 3 for VerseNr)

This Text-based (unique) ID can be easily transformed
to an Int-Value of the form: 01001001 - and that's what
I did whilst inserting into the FTS-Tables (filling their RowID-fields)

So, a later Join, to find scripture-differences between:
- FTS_Bible12 and FTS_Bible13 contents
becomes possible - e.g. this way:

Select T1.RowID, T1.Scripture, T2.Scripture
From FTS_Bible12 T1 Join FTS_Bible13 T2 On T1.RowID=T2.RowID
Where T1.Scripture <> T2.Scripture

It will (on the given example-DB above) find 8 records,
where scripture-content differs (after about 50msec or so).


For those interested, below is the import-Code I've used - which
will run as either VBScript - but also within VBA or VB5/6...
(it reads like "pseudo-code", so porting to other languages is easy):

'**depends on the COM-wrapper for SQLite (available at vbRichClient.com)
Function ImportInto(oMemDB, sTblNameBooks, sTblNameFTS, sBibleContent)
  On Error Resume Next

  With oMemDB
.BeginTrans
.Exec "Create Table " & sTblNameBooks & " (BookID Integer Primary 
Key, Title Text)"
.Exec "Create Virtual Table " & sTblNameFTS & " Using 
FTS5(Scripture, tokenize=porter)"


Dim sCmdB: sCmdB = "Insert Into " & sTblNameBooks & " (BookID, 
Title) Values(?,?)"
Dim sCmdV: sCmdV = "Insert Into " & sTblNameFTS & "(RowID, 
Scripture) Values(?,?)"


Dim L, VID, S
For Each L In Split(sBibleContent, vbCrLf) 'enumerate the Lines (in L)

  Select Case InStr(L, " ") 'check the Pos of the first Space-Char in L

Case 1  'we are probably still in a verse-block...
  If Len(VID) Then S = S & Mid(L, 11) 'Ok, concat that Verse-Part

Case 11 'a potential Verse-Idx-Line
  If InStr(L, ":") = 3 Then VID = Left(L, 10): S = Mid(L, 12)

Case 0  'probably an empty Line (as in the end of a Verse-Block)
  If Len(VID) Then
.ExecCmd CStr(sCmdV), CLng(Replace(VID, ":", "")), S 'RowID 
+ Scripture

 VID = "" 'reset the VID for the next verse
  End If

Case 5 'it's a potential Book-Line
  If Left(L, 4) = "Book" And Len(L) < 40 Then
 .ExecCmd CStr(sCmdB), CLng(Mid(L, 6, 2)), Trim(Mid(L, 10)) 
'BookID + Title

  End If
  End 

Re: [sqlite] Performance Issue on Large Table

2020-02-21 Thread Jens Alfke

> On Feb 20, 2020, at 5:24 PM, Chip Beaulieu  wrote:
> 
> CREATE INDEX `INX_Verses_BID` ON `Verses` (`BibleID`)
> CREATE INDEX `INX_Verses_BID_B_C` ON `Verses` (`BibleID`,`Book`,`Chapter`)
> CREATE INDEX `INX_Verses_BI_B_C_V` ON `Verses` 
> (`BibleID`,`Book`,`Chapter`,`VerseNumber`)

I believe the first two of those are unnecessary, since their column-lists are 
prefixes of the third. So removing them would buy you some performance.

(Disclaimer: not a guru.)

But you’ve said most of the problem is with FTS. When you update the database, 
does all of the indexed text really change? Or do many of the 30,000 new 
records contain the same text as their deleted counterparts? If the latter, you 
could optimize by not touching those rows.

It’s also possible that dropping the FTS table before the update, and 
re-creating it afterwards, would be faster than incrementally changing it.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Issue on Large Table

2020-02-21 Thread Luuk


On 21-2-2020 02:24, Chip Beaulieu wrote:

I have a table with 4.5 million records with full text indexing. Reads are very 
fast, but deleting / inserting  / updating takes on average about 50 seconds 
per record. I often do batches of 30,000 deletes / inserts at a time. The last 
batch took 10 hours to complete.

Here are the details:

Table Schema:

CREATE TABLE `Verses` (
`ID`integer PRIMARY KEY AUTOINCREMENT,
`BibleID`   integer NOT NULL,
`Book`  integer NOT NULL,
`Chapter`   integer NOT NULL,
`VerseNumber`   integer NOT NULL,
`Scripture` text,
`Comments`  text,


I would recommend to not store `Comments` in the same table as 
`Scripture`, or are you trying to re-write the Bible? 


This will avoid the deletion of unchanged data, which is inserted later 
on (only because `Comments` did change?)



--
Luuk

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance Issue on Large Table

2020-02-20 Thread Chip Beaulieu
I have a table with 4.5 million records with full text indexing. Reads are very 
fast, but deleting / inserting  / updating takes on average about 50 seconds 
per record. I often do batches of 30,000 deletes / inserts at a time. The last 
batch took 10 hours to complete.

Here are the details:

Table Schema:

CREATE TABLE `Verses` (
`ID`integer PRIMARY KEY AUTOINCREMENT,
`BibleID`   integer NOT NULL,
`Book`  integer NOT NULL,
`Chapter`   integer NOT NULL,
`VerseNumber`   integer NOT NULL,
`Scripture` text,
`Comments`  text,
FOREIGN KEY(`BibleID`) REFERENCES `Bibles`(`BibleID`) ON DELETE CASCADE,
FOREIGN KEY(`Book`) REFERENCES `Books`(`ID`)
)

Indexes:

CREATE INDEX `INX_Verses_BID` ON `Verses` (`BibleID`)
CREATE INDEX `INX_Verses_BID_B_C` ON `Verses` (`BibleID`,`Book`,`Chapter`)
CREATE INDEX `INX_Verses_BI_B_C_V` ON `Verses` 
(`BibleID`,`Book`,`Chapter`,`VerseNumber`)
CREATE INDEX `INX_Verses_B_C_V` ON `Verses` (`Book`,`Chapter`,`VerseNumber`)

Triggers on table:

CREATE TRIGGER after_verses_delete AFTER DELETE ON Verses 
BEGIN
  —the scripture table is the FTS5 table
DELETE FROM scriptures WHERE ID = old.ID;   —the scripture table is the 
FTS5 table
END

CREATE TRIGGER after_verses_insert AFTER INSERT ON Verses
BEGIN
  —the scripture table is the FTS5 table
  INSERT INTO scriptures (ID,Scripture) VALUES(new.ID, new.Scripture);
END

CREATE TRIGGER after_verses_update UPDATE OF Scripture ON Verses
BEGIN
  —the scripture table is the FTS5 table
  UPDATE scriptures SET Scripture = new.Scripture WHERE ID = old.ID;
END

I run the set of transactions from the command line on a mac using the .read 
command. A common scenario deletes 30,000 records, then inserts a new set of  
30,000 records. Here are the parameters I set up for the transaction in the 
.read file. To be honest, I’ve not seen much of an improvement with these and 
since the batch takes so long to complete, I was concerned tweaking other 
pragma statements that might introduce risk of corruption on crashes or power 
failure.

select time('now','-5 hours');
pragma temp_store=2;  --memory
pragma jouurnal_mode=TRUNCATE; --default is DELETE
pragma locking_mode=EXCLUSIVE; --default is NORMAL
BEGIN TRANSACTION;
DELETE FROM Verses WHERE BibleID=38;
INSERT INTO VERSES (BibleID, Book, Chapter, VerseNumber, Scripture) 
VALUES(38,1,1,1,'·In the beginning God created the heavens and the earth.');

… repeat 30,000 times with other records

COMMIT;
select time('now','-5 hours’);

I also recently vacuumed the file. The issue really showed up after the FTS5 
was setup on the table. I suspect it’s got something to do with the triggers 
more than the indexes. I am definitely open to suggestions. I’ve  not been able 
to find much info on the internet to optimize updates to tables with FTS.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue with left join in 3.24 compared with 3.22

2018-06-26 Thread Eric Grange
Thanks Richard!

Changing the inner join to a cross join works as well in that case, though
is it enough to always disable the left join optimization ?

I have other variants of the query with different/more left joined
tables/subqueries, and varying filtering conditions, as the query
is dynamically generated from user options and filters (which can indeed
lead to SQL that is not really "optimal").

Is having a cross join somewhere among the joins enough to "disable" the
left join strength reduction for other joins?

On Tue, Jun 26, 2018 at 5:58 PM, Richard Hipp  wrote:

> On 6/26/18, Eric Grange  wrote:
> > I am experiencing a massive performance issue on a query with a left join
> > in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> > milliseconds.
> > The problematic query looks like
> >
> >   select d.key_field, count(*) nb
> >   from low_volume_table b
> >   join mid_volume_table c on c.key_b = b.id
> >   left join high_volume_table d on d.key_c = c.id
> >   where b.id >= $1
> >   and b.filter1 = 0
> >   and c.filter2 > 0
> >   and d.filter3 > 0
> >   and d.filter4 = 1
> >   group by d.key_field
> >   order by nb desc
>
> Dan bisected and found the speed reduction coincides with the
> introduction of the LEFT JOIN strength reduction optimization
> (https://sqlite.org/optoverview.html#leftjoinreduction) in version
> 3.23.0.  (Dan bisected to the specific check-in
> https://sqlite.org/src/info/dd568c27).
>
> Your work-around is to change the LEFT JOIN into CROSS JOIN, thus
> forcing the query planner to preserve the same join order as it did
> before the string reduction optimization.
>
> We (the SQLite devs) will take an action to try to improve the query
> planner so that it picks a better plan for your case.
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue with left join in 3.24 compared with 3.22

2018-06-26 Thread Richard Hipp
On 6/26/18, Eric Grange  wrote:
> I am experiencing a massive performance issue on a query with a left join
> in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> milliseconds.
> The problematic query looks like
>
>   select d.key_field, count(*) nb
>   from low_volume_table b
>   join mid_volume_table c on c.key_b = b.id
>   left join high_volume_table d on d.key_c = c.id
>   where b.id >= $1
>   and b.filter1 = 0
>   and c.filter2 > 0
>   and d.filter3 > 0
>   and d.filter4 = 1
>   group by d.key_field
>   order by nb desc

Dan bisected and found the speed reduction coincides with the
introduction of the LEFT JOIN strength reduction optimization
(https://sqlite.org/optoverview.html#leftjoinreduction) in version
3.23.0.  (Dan bisected to the specific check-in
https://sqlite.org/src/info/dd568c27).

Your work-around is to change the LEFT JOIN into CROSS JOIN, thus
forcing the query planner to preserve the same join order as it did
before the string reduction optimization.

We (the SQLite devs) will take an action to try to improve the query
planner so that it picks a better plan for your case.


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue with left join in 3.24 compared with 3.22

2018-06-26 Thread Eric Grange
Also ran a few index to "force" the query plan, but with limited success:

- the "indexed by" clause does not result in the optimizer using the index
first, it just uses the indexes in the later steps of the query plan.
- using "not indexed" still results in the same table scan of
high_volume_table first, just without the index.
- using the unary "+" on the d table filters has no effect on the query
plan (as these are not indexed in the first place I guess)

Using unlikely() on the d table filters seems to be the only option that
works.


On Tue, Jun 26, 2018 at 10:02 AM, Eric Grange  wrote:

> Hi,
>
> I am experiencing a massive performance issue on a query with a left join
> in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
> milliseconds.
> The problematic query looks like
>
>   select d.key_field, count(*) nb
>   from low_volume_table b
>   join mid_volume_table c on c.key_b = b.id
>   left join high_volume_table d on d.key_c = c.id
>   where b.id >= $1
>   and b.filter1 = 0
>   and c.filter2 > 0
>   and d.filter3 > 0
>   and d.filter4 = 1
>   group by d.key_field
>   order by nb desc
>
> The filter fields on it are relatively non-discriminant (and
> non_indexed), however the key_field is indexed.
>
> The most discriminating conditions in this query are those on the
> low_volume and mid_volume tables, but the optimizer
> selects as first action:
>
> SCAN TABLE high_volume_table USING INDEX key_field_idx
>
> which leads to a huge number of iterations.
>
> If on the other hand, just one of the d filter conditions is removed, then
> the optimizer goes (like 3.22) first for
>
>SEARCH TABLE low_volume_table AS b USING COVERING INDEX
> low_volume_table_id_idx (b.filter1=? AND rowid>?)
>
> This happens after running ANALYZE, the sqlite1_stat for the high_volume
> table and key_field_idx is
>
>  5855234 6
>
> while for the  low_volume_table_filter1_idx it is
>
>  1976628 988314
>
> While the  low_volume_table_filter1_idx does not look very selective, as
> it is coupled with rowid filtering, it is actually very effective
> as it combines rowid & filter1, so there are just thousandths of rows
> being considered in the "group by", while when starting from
> a key_field_idx, there are millions of rows being considered, the
> overwhelming majority not fulfilling the conditions.
>
> The above table names and fields have been anonymized, if someone from the
> SQLite team want to have a look at the actual data,
> I can provide a database download (it's about 1.7 GB)
>
> Thanks!
>
>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue with left join in 3.24 compared with 3.22

2018-06-26 Thread Eric Grange
Hi,

I am experiencing a massive performance issue on a query with a left join
in 3.24, now taking 40 seconds, whereas in 3.22 it was just a few
milliseconds.
The problematic query looks like

  select d.key_field, count(*) nb
  from low_volume_table b
  join mid_volume_table c on c.key_b = b.id
  left join high_volume_table d on d.key_c = c.id
  where b.id >= $1
  and b.filter1 = 0
  and c.filter2 > 0
  and d.filter3 > 0
  and d.filter4 = 1
  group by d.key_field
  order by nb desc

The filter fields on it are relatively non-discriminant (and
non_indexed), however the key_field is indexed.

The most discriminating conditions in this query are those on the
low_volume and mid_volume tables, but the optimizer
selects as first action:

SCAN TABLE high_volume_table USING INDEX key_field_idx

which leads to a huge number of iterations.

If on the other hand, just one of the d filter conditions is removed, then
the optimizer goes (like 3.22) first for

   SEARCH TABLE low_volume_table AS b USING COVERING INDEX
low_volume_table_id_idx
(b.filter1=? AND rowid>?)

This happens after running ANALYZE, the sqlite1_stat for the high_volume
table and key_field_idx is

 5855234 6

while for the  low_volume_table_filter1_idx it is

 1976628 988314

While the  low_volume_table_filter1_idx does not look very selective, as it
is coupled with rowid filtering, it is actually very effective
as it combines rowid & filter1, so there are just thousandths of rows being
considered in the "group by", while when starting from
a key_field_idx, there are millions of rows being considered, the
overwhelming majority not fulfilling the conditions.

The above table names and fields have been anonymized, if someone from the
SQLite team want to have a look at the actual data,
I can provide a database download (it's about 1.7 GB)

Thanks!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in different versions

2018-02-01 Thread Nick
I realized that the amount of memory used for the page cache is different.
And I found that is the root cause.
Sorry for my careless mistake.

Thank you.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in different versions

2018-02-01 Thread Nick
Yup, absolutely you are right.
I just ran a new test using the same upper bound on the amount of memory
used for the page cache, then I found a reasonable result.

Thank you, Dan.
I did notice the cache_size change before but you made me realize it.

Thanks a lot.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in different versions

2018-02-01 Thread Dan Kennedy

On 02/01/2018 04:05 PM, Nick wrote:

I update sqlite in my Android mobile phone from 3.9.2 to 3.16.2.
And I find the default page_size in 3.9.2 is 1024 while in 3.16.2 is 4096
(changed since 3.12.0).
I think SQLITE_DEFAULT_PAGE_SIZE has great effect on the performance so I
use speedtest1.c to test it.

There are many test cases in speedtest1.c and case 270 is a DELETE case
which is the most time-consuming one.
There is a result.  (different version + different page_size)

  3.16.2+4096 3.16.2+1024  3.9.2+4096
3.9.2+1024
Case 270:   5.695s 5.908s   2.307s
6.130s
TOTAL  75.182s79.811s 58.723s
81.732s

It is easy to find 3.9.2+4096 is extremely faster than others.
And page_size has great effect on 3.9.2 but has only a small effect on
3.16.2.
But why?


I think 3.16.2 should faster than 3.9.2 according to the measurements of
sqlite (https://www.sqlite.org/cpu.html) but I guess 3.9.2 used 1024 at that
time.

Could someone give me any explanations? The result is strange but I think it
is reproducible if we set SQLITE_DEFAULT_PAGE_SIZE=4096 in 3.9.2.


Are you using the same sized cache in all tests?

For 3.9.2, the default cache was 2000 pages - 2MB with 1024 byte pages 
and 8MB with 4096 byte pages. But for 3.16.2 the default cache was 
changed to "-2000" - which means 2MB of cache regardless of page size. 
See the third paragraph of this:


  http://sqlite.org/pgszchng2016.html

Dan.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in different versions

2018-02-01 Thread Richard Hipp
On 2/1/18, Nick  wrote:
> I update sqlite in my Android mobile phone from 3.9.2 to 3.16.2.
>
> There are many test cases in speedtest1.c and case 270 is a DELETE case
> which is the most time-consuming one.
> There is a result.  (different version + different page_size)
>
>  3.16.2+4096 3.16.2+1024  3.9.2+4096
> 3.9.2+1024
> Case 270:   5.695s 5.908s   2.307s
>

We get very different results when comparing the performance of 3.9.2
to 3.16.2.  See the graph at https://www.sqlite.org/cpu.html for
example.

Since in your case, the 3.9.2 was compiled by Google and you compiled
3.16.2 yourself, I suggest that there might be some difference in your
compiler or compiler settings.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in different versions

2018-02-01 Thread Nick
Um, I am a OS application developer and we just upgraded the source code on
our developing engine.

I am sure I used the same compile-options. 
SQLITE_SECURE_DELETE is not set.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in different versions

2018-02-01 Thread Clemens Ladisch
Nick wrote:
> I update sqlite in my Android mobile phone from 3.9.2 to 3.16.2.

How?  Your own copy, or the system one?
Did you use the same configuration?  Especially SQLITE_SECURE_DELETE?


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue in different versions

2018-02-01 Thread Nick
I update sqlite in my Android mobile phone from 3.9.2 to 3.16.2.
And I find the default page_size in 3.9.2 is 1024 while in 3.16.2 is 4096
(changed since 3.12.0).
I think SQLITE_DEFAULT_PAGE_SIZE has great effect on the performance so I
use speedtest1.c to test it.

There are many test cases in speedtest1.c and case 270 is a DELETE case
which is the most time-consuming one.
There is a result.  (different version + different page_size)

 3.16.2+4096 3.16.2+1024  3.9.2+4096   
3.9.2+1024
Case 270:   5.695s 5.908s   2.307s  

6.130s
TOTAL  75.182s79.811s 58.723s   
 
81.732s

It is easy to find 3.9.2+4096 is extremely faster than others. 
And page_size has great effect on 3.9.2 but has only a small effect on
3.16.2.
But why?


I think 3.16.2 should faster than 3.9.2 according to the measurements of
sqlite (https://www.sqlite.org/cpu.html) but I guess 3.9.2 used 1024 at that
time.

Could someone give me any explanations? The result is strange but I think it
is reproducible if we set SQLITE_DEFAULT_PAGE_SIZE=4096 in 3.9.2. 

Thank you.





--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue for version 3.9.2 compared with 3.16.2

2017-10-30 Thread Richard Hipp
On 10/17/17, advancenOO  wrote:
> In order to profile sqlite 3.9.2 and sqlite 3.16.2, I run speedtest1.c on my
> mobile phone.
> It is weird to find that 3.9.2 is better than 3.16.2, especially in the
> following case:
> “270, 1 DELETEs, numeric BETWEEN, indexed"
> 3.9.2 uses 0.7s while 3.16.2 uses 1.8s;
>
> My relevant compilation options are shown below:
> -DSQLITE_THREADSAFE=1
> -DSQLITE_DEFAULT_AUTOVACUUM=1
> -DSQLITE_DEFAULT_PAGE_SIZE=4096
> -DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=1048576
>
> Further more, 3.16.2 performs better when I decrease the delete times from
> 1 to 1000.
> So is it expected? If not, then what might be the possible reason?

It is not expected.  In fact, it directly contradicts our on
measurements shown here: https://www.sqlite.org/cpu.html

I have no explanation for your anomalous result.

Aside:  This email originally went into Spam.  I only discovered it,
moments ago, while searching for a different message that had gone to
spam.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue for version 3.9.2 compared with 3.16.2

2017-10-17 Thread advancenOO
In order to profile sqlite 3.9.2 and sqlite 3.16.2, I run speedtest1.c on my
mobile phone.
It is weird to find that 3.9.2 is better than 3.16.2, especially in the
following case:
“270, 1 DELETEs, numeric BETWEEN, indexed"
3.9.2 uses 0.7s while 3.16.2 uses 1.8s;

My relevant compilation options are shown below:
-DSQLITE_THREADSAFE=1
-DSQLITE_DEFAULT_AUTOVACUUM=1
-DSQLITE_DEFAULT_PAGE_SIZE=4096
-DSQLITE_DEFAULT_JOURNAL_SIZE_LIMIT=1048576

Further more, 3.16.2 performs better when I decrease the delete times from
1 to 1000.
So is it expected? If not, then what might be the possible reason? 

Thanks for your help.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue on a read only database

2017-06-14 Thread Hervé Gauthier
Hi all, I tried the sugestion made by Clemens using PRAGMA locking_mode =
EXCLUSIVE.

This works fine for me.

Thanks a lot.

2017-06-13 20:46 GMT+02:00 David Raymond :

> "Also please note that SQLite does a 'lazy open'.  When you create your
> connection to the database file, SQLite doesn’t actually open the file.
> Instead the file handling is done the first time SQLite needs the data from
> the file.  So the first SELECT after a new connection is made takes longer
> than the others.
>
> Simon."
>
>
> I think this is the big thing. "Opening" a database doesn't actually do
> much. That 14ms is probably from parsing the database schema. That won't
> happen until your first select.
>
> When doing a "begin transaction" the default is a deferred begin, which
> won't lock the database or read the schema until you run a
> select/update/etc. Once it does lock the file with that first select after
> the begin, then the connection doesn't have to parse anything again since
> no other process could have changed it.
>
> With individual selects though, the connection has to check every time if
> the schema has been changed on it while it was sitting idle. Remember that
> a "read only connection" only means that "I can't change it." It doesn't
> mean "no one else can change it either."
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue on a read only database

2017-06-13 Thread David Raymond
"Also please note that SQLite does a 'lazy open'.  When you create your 
connection to the database file, SQLite doesn’t actually open the file.  
Instead the file handling is done the first time SQLite needs the data from the 
file.  So the first SELECT after a new connection is made takes longer than the 
others.

Simon."


I think this is the big thing. "Opening" a database doesn't actually do much. 
That 14ms is probably from parsing the database schema. That won't happen until 
your first select.

When doing a "begin transaction" the default is a deferred begin, which won't 
lock the database or read the schema until you run a select/update/etc. Once it 
does lock the file with that first select after the begin, then the connection 
doesn't have to parse anything again since no other process could have changed 
it.

With individual selects though, the connection has to check every time if the 
schema has been changed on it while it was sitting idle. Remember that a "read 
only connection" only means that "I can't change it." It doesn't mean "no one 
else can change it either."
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue on a read only database

2017-06-13 Thread Simon Slavin


On 13 Jun 2017, at 10:53am, rv.gauth...@free.fr wrote:

> Is there a way (pragma, compile flags, ...) to gain these 14 ms for the first 
> query ?

In SQLite, it is transactions which take all the time.  Individual commands 
such as SELECT are fast.  What takes the time is the locking, journal access, 
database updating, etc. that happens at the end of a transaction.

The rules for SQL state that file access must be done within a transaction.  
You would expect that if you did SELECT without having first done BEGIN, you 
would get an error message.  However, SQLite is very nice to you and if it 
thinks you forgot to make a transaction it will make one for you, by wrapping 
your SELECT as BEGIN ... SELECT ... END .

This explains why a SELECT by itself takes a lot of time, whereas a SELECT 
inside a transaction is faster.

Also please note that SQLite does a 'lazy open'.  When you create your 
connection to the database file, SQLite doesn’t actually open the file.  
Instead the file handling is done the first time SQLite needs the data from the 
file.  So the first SELECT after a new connection is made takes longer than the 
others.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue on a read only database

2017-06-13 Thread Eduardo Morras
On Tue, 13 Jun 2017 11:53:05 +0200
rv.gauth...@free.fr wrote:

> Hi all,
> 
> we are using sqlite3 on an embedded application.
> The database is opened in read only mode and can't be modified by the 
> application.
> 
> I noticed that the first SELECT statement after a BEGIN TRANSACTION 
> takes at least 14 ms.
> All subsequent queries in the same transaction are taking near 0 ms.

The first SELECT says to sqlite to do all the work needed to make your
query ACID constraint (acquires locks, malloc ram, generate structs,
open files, etc...)

> 
> If I omit the BEGIN TRANSACTION, all queries are taking at least 14
> ms.

Because each query automatically is wrapped with BEGIN TRANSACTION /
query / END TRANSACTION and the work to make your query ACID is done on
every query, not on the first one.

> I tried also to do a unique BEGIN TRANSACTION after opening the 
> database. This fixes the timing issue, but my memory increases 
> constantly.

... Do you know what's a transaction? In your case it's a read only db
and don't think the could be a problem, but in normal use cases could
be catastrophic.

> Is there a way (pragma, compile flags, ...) to gain these 14 ms for
> the first query ?

Try openinig with URI, check for 'inmutable' parameter
https://www.sqlite.org/draft/c3ref/open.html , perhaps you can gain
some ms, but it's dangerous, I warn you.

> 
> Thanks.
> 
> 
> Hervé Gauthier.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue on a read only database

2017-06-13 Thread Clemens Ladisch
rv.gauth...@free.fr wrote:
> I noticed that the first SELECT statement after a BEGIN TRANSACTION takes at 
> least 14 ms.
> All subsequent queries in the same transaction are taking near 0 ms.
>
> If I omit the BEGIN TRANSACTION, all queries are taking at least 14 ms.

Because then you get an automatic transaction for every query.

> I tried also to do a unique BEGIN TRANSACTION after opening the database. 
> This fixes the timing issue, but my memory increases constantly.

Probably just the cache.

> Is there a way (pragma, compile flags, ...) to gain these 14 ms for the first 
> query ?

Try PRAGMA locking_mode = EXCLUSIVE:
http://www.sqlite.org/pragma.html#pragma_locking_mode


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] performance issue on a read only database

2017-06-13 Thread rv . gauthier

Hi all,

we are using sqlite3 on an embedded application.
The database is opened in read only mode and can't be modified by the 
application.


I noticed that the first SELECT statement after a BEGIN TRANSACTION 
takes at least 14 ms.

All subsequent queries in the same transaction are taking near 0 ms.

If I omit the BEGIN TRANSACTION, all queries are taking at least 14 ms.

I tried also to do a unique BEGIN TRANSACTION after opening the 
database. This fixes the timing issue, but my memory increases 
constantly.


Is there a way (pragma, compile flags, ...) to gain these 14 ms for the 
first query ?


Thanks.


Hervé Gauthier.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue through Dll upgrade

2016-12-07 Thread Simon Slavin

On 1 Dec 2016, at 7:24am, Stephan Stauber  wrote:

> SQLite 3.8.5:  to INSERT 380.000 records into a in 
> inMemory Database it takes 10 seconds
> SQLite 3.10.0 to INSERT 380.000 records into a in 
> inMemory Database it takes 35 seconds

Are you doing those INSERTs in transactions (either one big one or perhaps 
batches of 1000) or separately ?

Can you get use a timing facility to find out whether your process is using 
more CPU time as well as more wall time ?  If you can write the commands to a 
text file you can use the ".timer" command to do it.

I know this doesn’t solve the problem but it adds useful data.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] performance issue through Dll upgrade

2016-12-07 Thread Stephan Stauber
sorry for the delayed response.

with 3.15.2 it is even worse ( it takes 1246s to INSERT 380.000 records)
with 3.8.5 it took only 10s

Best regards

Stephan


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue through Dll upgrade

2016-12-01 Thread Dan Kennedy

On 12/01/2016 02:24 PM, Stephan Stauber wrote:

Hello,

we have following performance Issue since we upgraded from SQLite 3.8.5 to 
SQLite 3.10.0:

SQLite 3.8.5:  to INSERT 380.000 records into a in inMemory 
Database it takes 10 seconds
SQLite 3.10.0 to INSERT 380.000 records into a in inMemory 
Database it takes 35 seconds

Everything else is identical. Same configuration of SQLite DB and same data 
structure.
(the INSERT statement is pre-compiled)


Does 3.15.2 have the same problem?

Dan.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] performance issue through Dll upgrade

2016-11-30 Thread Stephan Stauber
Hello,

we have following performance Issue since we upgraded from SQLite 3.8.5 to 
SQLite 3.10.0:

SQLite 3.8.5:  to INSERT 380.000 records into a in inMemory 
Database it takes 10 seconds
SQLite 3.10.0 to INSERT 380.000 records into a in inMemory 
Database it takes 35 seconds

Everything else is identical. Same configuration of SQLite DB and same data 
structure.
(the INSERT statement is pre-compiled)

Has anybody any idea what is causing the performance problems with the higher 
SQLite Version.
In the chronology I could not find anything what is related to this issue.

Any ideas?

Best regards

Stephan
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue, v.3.10 compared to v.3.14 (inner joins)

2016-08-16 Thread Richard Hipp
On 8/16/16, Raja Kajiev  wrote:
>
> *The output of .fullschema command:*
> sqlite> .fullschema -indent
> CREATE TABLE 'CLIPPED_UPDATE'(
> CREATE TABLE 'DATASET'(
> CREATE TABLE 'EXSET'(
> CREATE TABLE 'EXSET_DATASET'(
> /* No STAT tables available */

No indexes.  No PRIMARY KEYs.

I wonder if exset_id is actually the PRIMARY KEY of the exset_dataset
table.  Just adding a "PRIMARY KEY" declaration to that column in the
CREATE TABLE statement might fix all of your problems.

But to really do this right, you should consider adding appropriate
PRIMARY KEYs and/or UNIQUE constraints and/or indexes on all of your
tables, including the "trtu" TEMP table.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue, v.3.10 compared to v.3.14 (inner joins)

2016-08-16 Thread Raja Kajiev
I can send whatever required else if the following bits are not enough.


*The output of .fullschema command:*
sqlite> .fullschema -indent
CREATE TABLE 'CLIPPED_UPDATE'(
  'name' TEXT,
  'edition' INTEGER NOT NULL,
  'coverage_id' INTEGER NOT NULL,
  'update_no' INTEGER NOT NULL,
  'data' BLOB NOT NULL
);
CREATE TABLE 'DATASET'(
  'name' TEXT NOT NULL,
  'edition' INTEGER NOT NULL,
  'lcdate' TEXT NOT NULL
);
CREATE TABLE 'EXSET'(
  'id' INTEGER NOT NULL,
  'name' TEXT NOT NULL,
  'issue' INTEGER NOT NULL,
  PRIMARY KEY(id)
);
CREATE TABLE 'EXSET_DATASET'(
  'exset_id' INTEGER NOT NULL,
  'name' TEXT NOT NULL,
  'edition' INTEGER NOT NULL,
  'chart_code' TEXT NOT NULL,
  'update_no' INTEGER NOT NULL,
  'coverage_id' INTEGER NOT NULL
);
/* No STAT tables available */


*The SQL statement:*

SELECT cu.name name, cu.edition edition, cu.update_no update_no, t.pos pos,
cu.data data
FROM CLIPPED_UPDATE cu
 INNER JOIN temp.trtu t ON cu.name=t.name AND cu.edition=t.edition
 INNER JOIN EXSET_DATASET ed ON ed.name=t.name AND ed.edition=t.edition AND
ed.coverage_id=cu.coverage_id
 WHERE ed.exset_id=%d AND cu.update_no>t.update_no
ORDER BY name, edition, update_no

where %d is some integer (substituted before passing this string to
sqlite3_prepare_v2() function),
and temp.trtu table is a temp table, created with statement "CREATE TEMP
TABLE temp.trtu (name TEXT, edition INTEGER, update_no INTEGER, pos INTEGER)
"

dixi

On 11 August 2016 at 11:48, Richard Hipp  wrote:

> On 8/10/16, Raja Kajiev  wrote:
> > The issue is: one of requests performed in my app was executed in ~7
> > seconds in v.3.10, but in v.3.14 execution time is ~13-15 seconds. That's
> > really itching in my case.
> > The sql request in question includes "inner join" constructs.
> > I also remember that in version prior to 3.10 the execution time for that
> > particular request was much larger than 10 seconds.
> > The full code of the routine is below. I can put on share somewhere a
> copy
> > of the db which is used (it is not a secret data, but it is a little
> large
> > - about 290 MB).
>
> Please send:
>
> (1) The output of the ".fullschema -indent" command as run from the
> command-line shell.
> (2) The text of the SQL statement that is running slowly.
>
> --
> D. Richard Hipp
> d...@sqlite.org
>



-- 
LinkedIn: http://www.linkedin.com/pub/aleksey-smirnov/13/598/3b3
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] performance issue, v.3.10 compared to v.3.14 (inner joins)

2016-08-11 Thread Richard Hipp
On 8/10/16, Raja Kajiev  wrote:
> The issue is: one of requests performed in my app was executed in ~7
> seconds in v.3.10, but in v.3.14 execution time is ~13-15 seconds. That's
> really itching in my case.
> The sql request in question includes "inner join" constructs.
> I also remember that in version prior to 3.10 the execution time for that
> particular request was much larger than 10 seconds.
> The full code of the routine is below. I can put on share somewhere a copy
> of the db which is used (it is not a secret data, but it is a little large
> - about 290 MB).

Please send:

(1) The output of the ".fullschema -indent" command as run from the
command-line shell.
(2) The text of the SQL statement that is running slowly.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] performance issue, v.3.10 compared to v.3.14 (inner joins)

2016-08-11 Thread Raja Kajiev
The issue is: one of requests performed in my app was executed in ~7
seconds in v.3.10, but in v.3.14 execution time is ~13-15 seconds. That's
really itching in my case.
The sql request in question includes "inner join" constructs.
I also remember that in version prior to 3.10 the execution time for that
particular request was much larger than 10 seconds.
The full code of the routine is below. I can put on share somewhere a copy
of the db which is used (it is not a secret data, but it is a little large
- about 290 MB).
I would appreciate any hint or advice on any detail of the implementation.


int ClippedUpdatesData::prepareClippedUpdates_sqlite(cMem& m_memUpdtReqSrv,
   int
exSetID,

CComQIPtr& m_pICrypt,
   cMem&
m_memBufNames3,
   cMem&
m_memBufData,
   long
m_MaxLengthData,
   const char*
m_path)
{
int processedFiles = 0;

const std::string sql_createTempTable = "CREATE TEMP TABLE temp.trtu
(name TEXT, edition INTEGER, update_no INTEGER, pos INTEGER)";
const std::string sql_dropTempTable = "DROP TABLE temp.trtu";
const std::string sql_insertTemp = "INSERT INTO temp.trtu VALUES
(@name, @edition, @update_no, @pos)";

int dbResult;
sqlite3_stmt *pStmt;
if(sqlite3_prepare_v2(m_sqlite_pDb, sql_createTempTable.c_str(),
sql_createTempTable.length(), , NULL) != SQLITE_OK) {
throw std::runtime_error(sqlite3_errmsg(m_sqlite_pDb));
}
dbResult = sqlite3_step(pStmt);
sqlite3_finalize(pStmt);

sqlite3_exec(m_sqlite_pDb, "BEGIN TRANSACTION", NULL, NULL, NULL);
if(sqlite3_prepare_v2(m_sqlite_pDb, sql_insertTemp.c_str(),
sql_insertTemp.length(), , NULL) != SQLITE_OK) {
throw std::runtime_error(sqlite3_errmsg(m_sqlite_pDb));
}

LOG4CPLUS_DEBUG(m_logger, "prepareClippedUpdates_sqlite - fill
temporary table");
int tempDataCounter = 0;
int nDatasets = m_memUpdtReqSrv.GetStructNumbers();
UPDATE_REC_SRV *aDataset = (UPDATE_REC_SRV
*)m_memUpdtReqSrv.GetStructPointer();
for (int i = 0; i < nDatasets; ++i) {
const UPDATE_REC_SRV& theDataset = aDataset[i];
if (theDataset.numbUpdt != 0) {
tempDataCounter += 1;
sqlite3_bind_text(pStmt, 1, theDataset.name, 8, SQLITE_STATIC);
sqlite3_bind_int(pStmt, 2, theDataset.numbEdit);
sqlite3_bind_int(pStmt, 3, (theDataset.numbUpdt - 1));
sqlite3_bind_int(pStmt, 4, i);

dbResult = sqlite3_step(pStmt);
sqlite3_reset(pStmt);
}
}
sqlite3_exec(m_sqlite_pDb, "END TRANSACTION", NULL, NULL, NULL);
sqlite3_finalize(pStmt);

if(tempDataCounter > 0) {
LOG4CPLUS_DEBUG(m_logger, "getting datasets info from the DB...");
char szSql[1024];
sprintf(szSql,
"SELECT cu.name name, cu.edition edition, cu.update_no
update_no, t.pos pos, cu.data data "
"FROM CLIPPED_UPDATE cu "
" INNER JOIN temp.trtu t ON cu.name=t.name AND
cu.edition=t.edition "
" INNER JOIN EXSET_DATASET ed ON ed.name=t.name AND
ed.edition=t.edition AND ed.coverage_id=cu.coverage_id "
" WHERE ed.exset_id=%d AND cu.update_no>t.update_no "
"ORDER BY name, edition, update_no", exSetID);
if(sqlite3_prepare_v2(m_sqlite_pDb, szSql, strlen(szSql), ,
NULL) != SQLITE_OK) {
throw std::runtime_error(sqlite3_errmsg(m_sqlite_pDb));
}

int nDataset = 0; // last processed dataset
while(true) {
dbResult = sqlite3_step(pStmt);
if(dbResult == SQLITE_ROW) {
int nEdition = sqlite3_column_int(pStmt, 1);
nDataset = sqlite3_column_int(pStmt, 3);
const void* blobBuf = sqlite3_column_blob(pStmt, 4);
int blobSize = sqlite3_column_bytes(pStmt, 4);

CComVariant vUpdate;
if((HRESULT)m_pICrypt->Decompress((BYTE*)blobBuf, blobSize,
) < 0) {
throw std::runtime_error("'prepareClippedUpdates_mssql'
- crypt problem");
}
BEG_HEAD_UPDT_FILE *pBegHeadUpdt = (BEG_HEAD_UPDT_FILE
*)vUpdate.parray->pvData;
if (pBegHeadUpdt->nTotalNumberOfRecords == 1) {
// possible termination
if(pBegHeadUpdt->wEditionNumber != nEdition  //
empty update
&& pBegHeadUpdt->wEditionNumber != nEdition + 1  //
new edition available
&& pBegHeadUpdt->wEditionNumber != 0 //
terminated
) {
throw
std::runtime_error("'prepareClippedUpdates_mssql' - data 

[sqlite] Performance issue with CTE

2015-10-05 Thread Philippe Riand
Just to let you know, the solution using SELECT * FROM (query with 
offset/limit) works perfectly well. Thanks a lot for the suggestion!


[sqlite] Performance issue with CTE

2015-10-02 Thread Philippe Riand
I think the doc is right. I overcame the problem by using a construct like:
SELECT field1, field2? WHERE PKEY IN (SELECT PKEY ? WHERE  OFFSET n LIMIT 
m)
That executes a sub query.

But your solution looks actually better, as it is:
SELECT * FROM  (SELECT field1, field2? WHERE  OFFSET n LIMIT m)

I?ll try it. Thanks for the suggestion.


[sqlite] Performance issue with CTE

2015-10-02 Thread Philippe Riand
Thanks. I know about the technique your mentioned, but the point is not about 
the use of offset or not. The same issue will happen but using a key.
See my other reply above.


[sqlite] Performance issue with CTE

2015-10-01 Thread Kees Nuyt
On Thu, 1 Oct 2015 13:40:23 +0200, Clemens Ladisch
 wrote:

> OFFSET is inefficient because the database still has to compute all the
> rows before skipping over them.
>
> To do paging, remember the first and last date values on the page, and
> for the previous/next page, just continue from there:
>
>  SELECT ...
>  FROM MyTable
>  WHERE date > :LastDateOnPreviousPage
>  ORDER BY date
>  LIMIT 25;
>
> (If dates are not unique, you have to use more columns.)

Indeed. More info on this technique:
http://sqlite.org/cvstrac/wiki?p=ScrollingCursor 
(page is obsolete but still works)

-- 
Regards,

Kees Nuyt


[sqlite] Performance issue with CTE

2015-10-01 Thread Clemens Ladisch
Philippe Riand wrote:
> I have a table with 500,000+ records. The table has a date column,
> that I?m using to sort my queries (the columns has an index). Simple
> queries on the table work very well, using ORDER BY, LIMIT & OFFSET.
> I?m actually extracting ?pages? of rows that I?m displaying in a web
> page.

OFFSET is inefficient because the database still has to compute all the
rows before skipping over them.

To do paging, remember the first and last date values on the page, and
for the previous/next page, just continue from there:

  SELECT ...
  FROM MyTable
  WHERE date > :LastDateOnPreviousPage
  ORDER BY date
  LIMIT 25;

(If dates are not unique, you have to use more columns.)

> Now, instead of a simple select, I?d like to execute a recursive query
> using CTE, because the rows are organized in an hierarchy. And I?m only
> interested by a page, let's say the first n root records.

What exactly do you mean with the last sentence?  Are you implying that
the page contains all children, regardless of how many there are?


Regards,
Clemens


[sqlite] Performance issue with CTE

2015-10-01 Thread E.Pasma
Op 1 okt 2015, om 04:10 heeft Philippe Riand wrote:

> I have a table with 500,000+ records. The table has a date column,  
> that I?m using to sort my queries (the columns has an index). Simple  
> queries on the table work very well, using ORDER BY, LIMIT & OFFSET.  
> I?m actually extracting ?pages? of rows that I?m displaying in a web  
> page. Great!.
>
> Now, instead of a simple select, I?d like to execute a recursive  
> query using CTE, because the rows are organized in an hierarchy. And  
> I?m only interested by a page, let's say the first n root records.
> Unfortunately, the doc says that the ?initial-select? in a recursive  
> common table exception may not include ORDER BY, LIMIT or OFFSET. As  
> a result SQLIte probably scans the whole table, which leads to very  
> poor performance? With other databases (like PostgreSQL), I don?t  
> have the problem because they accept ORDER BY, LIMIT and OFFSET on  
> the initial-select, which limits the scan for the initial-select.
>
> What would be the proper solution with SQLite? One would involve a  
> first query that selects the initial row ids, and then pass them as  
> a condition to the initial-select in the recursive. But does anyone  
> has a better proposal?

Is the documentation really right? Yes, it is a syntax error to write  
ORDER BY or LIMIT before UNON.
Semantically however there is no limitation. The ORDER BY must be  
inside a subquery (an inline view):

select * from (select * from t order by k)
union ...

My SQLite version is 3.6.11


[sqlite] Performance issue with CTE

2015-09-30 Thread Philippe Riand
I have a table with 500,000+ records. The table has a date column, that I?m 
using to sort my queries (the columns has an index). Simple queries on the 
table work very well, using ORDER BY, LIMIT & OFFSET. I?m actually extracting 
?pages? of rows that I?m displaying in a web page. Great!.

Now, instead of a simple select, I?d like to execute a recursive query using 
CTE, because the rows are organized in an hierarchy. And I?m only interested by 
a page, let's say the first n root records.
Unfortunately, the doc says that the ?initial-select? in a recursive common 
table exception may not include ORDER BY, LIMIT or OFFSET. As a result SQLIte 
probably scans the whole table, which leads to very poor performance? With 
other databases (like PostgreSQL), I don?t have the problem because they accept 
ORDER BY, LIMIT and OFFSET on the initial-select, which limits the scan for the 
initial-select.

What would be the proper solution with SQLite? One would involve a first query 
that selects the initial row ids, and then pass them as a condition to the 
initial-select in the recursive. But does anyone has a better proposal?


[sqlite] Performance issue

2015-04-13 Thread R.Smith

On 2015-04-13 09:49 AM, Jeff Roux wrote:
> Hi everyone,
>
> I have copied the original database on my personnal website in a tbz
> archive here :
>
> http://nice-waterpolo.com/misc/db/
>
> There is only one index on timestamp,protocol.

Hi Jeff,

I am not sure what is wrong your side, but the table as I downloaded it 
produced the said query (including all rows) in just over 2 seconds for 
me - that's from a spinning disk.

Good news is I found some improvements. I've included the schema in the 
script below to see I'm not cheating - I dropped the index and made only 
1 index on timestamp - and then for the large query made it ignore the 
index so it was faster through a table scan (but this changes when you 
constrain the selection to a subset, as the last two queries in the 
output show).

Your Query runs on the full set in sub 1-second times with this setup.

This ran in Windows in SQLitespeed while the GUI was up, so if I pushed 
it through the sqlite3.exe command-line shell, another few milliseconds 
would be shaved off.

After the output below I list the pragma settings too in case they 
differ from your setup. I'm hoping the answer presents itself among all 
this. (and I hope the e-mail word-wrapping doesn't mess it up too much).


   -- Processing SQL in: D:\Documents\SQLiteAutoScript.sql
   -- Script Items: 9  Parameter Count: 0
   -- 2015-04-13 14:31:53.280  |  [Info]   Script Initialized, 
Started executing...
   -- 


SELECT sqlite_version() AS Ver;

   -- Ver
   -- ---
   -- 3.8.9


SELECT * FROM sqlite_master;

   -- type| name  | tbl_name   | rootpage   | sql
   -- --- | - | -- |  | 

   -- table   | flows | flows |   2  | CREATE 
TABLE flows (idx INTEGER PRIMARY KEY, ipLan INTEGER,
   -- |   | |  | ipWan INTEGER, 
flags INTEGER, portLan INTEGER, portWan
   -- |   | |  | INTEGER, tos 
INTEGER, protocol INTEGER, wanTcpFlags
   -- |   | |  | INTEGER, 
nbBytesDecompOut INTEGER, nbBytesCompIn INTEGER,
   -- |   | |  | duration 
INTEGER, applId INTEGER, CST INTEGER, SRT INTEGER,
   -- |   | |  | RTT INTEGER, 
lan_retransmission INTEGER, wan_retransmission
   -- |   | |  | INTEGER, 
nbPktDecompIn INTEGER, nbPktCompOut INTEGER,
   -- |   | |  | nbBytesDecompIn 
INTEGER, nbBytesCompOut INTEGER, timestamp
   -- |   | |  | INTEGER, 
rtpTypeLAN INTEGER, rtpPktsLossLAN INTEGER,
   -- |   | |  | rtpJitterLAN 
INTEGER, rtpFactorRLAN INTEGER, rtpTypeWAN
   -- |   | |  | INTEGER, 
rtpPktsLossWAN INTEGER, rtpJitterWAN INTEGER,
   -- |   | |  | rtpFactorWLAN 
INTEGER, nbBytesDbcDecompOut INTEGER,
   -- |   | |  | 
nbBytesDbcCompIn INTEGER, nbBytesDefDecompOut INTEGER,
   -- |   | |  | 
nbBytesDefCompIn INTEGER, nbPktDecompOut INTEGER,
   -- |   | |  | nbPktCompIn 
INTEGER, nbBytesDbcDecompIn INTEGER,
   -- |   | |  | 
nbBytesDbcCompOut INTEGER, nbBytesDefDecompIn INTEGER,
   -- |   | |  | 
nbBytesDefCompOut INTEGER)
   -- table   | application   | application |   4  | 
CREATE TABLE application (id INTEGER CONSTRAINT applpk
   -- |   | |  | PRIMARY KEY, 
shortname VARCHAR(64), name VARCHAR(256))
   -- table   | sqlite_stat1  | sqlite_stat1 |   6  | 
CREATE TABLE sqlite_stat1(tbl,idx,stat)
   -- index   | Idx_flows_timestamp   | flows |   7  | CREATE 
INDEX Idx_flows_timestamp ON flows (timestamp)


SELECT * FROM sqlite_stat1;

   -- tbl   | idx   | stat
   -- - | - | ---
   -- application   |   | 108
   -- flows | Idx_flows_timestamp   | 100 992

-- All-rows Query using the Index
EXPLAIN QUERY PLAN
SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan as item
   FROM flows
  WHERE timestamp BETWEEN 1383770600 AND 1384770600 AND protocol IN (17, 6)
  GROUP BY portLan
  ORDER BY vol DESC LIMIT 6;

   -- selectid | order | from | detail
   --  | - |  | 

   -- 0|   0   |   0  | SEARCH TABLE flows USING INDEX 
Idx_flows_timestamp
   --  |   | 

[sqlite] Performance issue

2015-04-13 Thread Jeff Roux
Hi everyone,

I have copied the original database on my personnal website in a tbz
archive here :

http://nice-waterpolo.com/misc/db/

There is only one index on timestamp,protocol.

Thanks.

2015-04-08 14:38 GMT+02:00 R.Smith :

>
>
> On 2015-04-08 01:57 PM, Dominique Devienne wrote:
>
>> No Dominique, it's not that -
>>>
>>>  Perhaps. But that still doesn't get to my point. With a LIMIT clause, in
>> such a GROUP BY ORDER BY returning a large result set, would SQLite:
>> 1) sort the whole result-set and then keep only the first top-N rows?
>> 2) or instead do a partial-sort of the first top-N rows only, as in
>> http://www.cplusplus.com/reference/algorithm/partial_sort/?
>>
>> I'm interested in finding out for sure. Perhaps that's highjacking this
>> thread a bit, but in case of SQLite doing #1, and MySQL doing #2, it could
>> explain some of the difference. (although sorting a 1M array is so fast
>> nowadays, I doubt it.).
>>
>
> I think the partial sort algorithm only finds the first N items and then
> stops sorting, but for that to be possible the result set must be present
> in full and finished calculating in full already. The partial sort itself
> might save a millisecond or two from a complete sort in large lists. Either
> way, SQLite is more clever than that as Richard pointed out.
>
>
>  His rows are "fatter", since he mentioned 41 columns. Which might make it
>> go over some threshold(s) (page cache?) slowing things down once past it.
>>
>> But indeed, sharing the DB (if not sensitive data) would be the way to go.
>>
> No no, we know his rows' fatness exactly, he did send the schema, they are
> 41 integer values, i.e it doesn't matter.
>
> So yeah, there must be some trivial thing which the OP (and I) are
> missing. Even an obscured values DB that still causes the slow query will
> work...
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Performance issue

2015-04-08 Thread R.Smith


On 2015-04-08 01:57 PM, Dominique Devienne wrote:
>> No Dominique, it's not that -
>>
> Perhaps. But that still doesn't get to my point. With a LIMIT clause, in
> such a GROUP BY ORDER BY returning a large result set, would SQLite:
> 1) sort the whole result-set and then keep only the first top-N rows?
> 2) or instead do a partial-sort of the first top-N rows only, as in
> http://www.cplusplus.com/reference/algorithm/partial_sort/?
>
> I'm interested in finding out for sure. Perhaps that's highjacking this
> thread a bit, but in case of SQLite doing #1, and MySQL doing #2, it could
> explain some of the difference. (although sorting a 1M array is so fast
> nowadays, I doubt it.).

I think the partial sort algorithm only finds the first N items and then 
stops sorting, but for that to be possible the result set must be 
present in full and finished calculating in full already. The partial 
sort itself might save a millisecond or two from a complete sort in 
large lists. Either way, SQLite is more clever than that as Richard 
pointed out.


> His rows are "fatter", since he mentioned 41 columns. Which might make it
> go over some threshold(s) (page cache?) slowing things down once past it.
>
> But indeed, sharing the DB (if not sensitive data) would be the way to go.
No no, we know his rows' fatness exactly, he did send the schema, they 
are 41 integer values, i.e it doesn't matter.

So yeah, there must be some trivial thing which the OP (and I) are 
missing. Even an obscured values DB that still causes the slow query 
will work...




[sqlite] Performance issue

2015-04-08 Thread Dominique Devienne
On Wed, Apr 8, 2015 at 2:09 PM, Richard Hipp  wrote:

> On 4/8/15, Dominique Devienne  wrote:
> >  With a LIMIT clause, in
> > such a GROUP BY ORDER BY returning a large result set, would SQLite:
> > 1) sort the whole result-set and then keep only the first top-N rows?
> > 2) or instead do a partial-sort of the first top-N rows only,
>
> SQLite must examine all rows of output, obviously.  But it only keeps
> the top-N in memory and only sorts the top-N.  If there are a total of
> M candidate rows and only the top-N are to be displayed, then the
> algorithm is O(M*logN) in time and O(N) in space.
>

Thank you Richard. --DD


[sqlite] Performance issue

2015-04-08 Thread Dominique Devienne
On Wed, Apr 8, 2015 at 1:24 PM, R.Smith  wrote:

> On 2015-04-08 11:35 AM, Dominique Devienne wrote:
>
>> On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux  wrote:
>>
>>> time echo 'SELECT ... FROM flows WHERE timestamp>=1383770600 AND \
>>>   timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
>>>   ORDER BY vol DESC LIMIT 6;' |
>>>   mysql testperf
>>>
>>

>  If you dropped the LIMIT 6 from both, are the results
>> significantly different?
>>
>

> No Dominique, it's not that -
>

Perhaps. But that still doesn't get to my point. With a LIMIT clause, in
such a GROUP BY ORDER BY returning a large result set, would SQLite:
1) sort the whole result-set and then keep only the first top-N rows?
2) or instead do a partial-sort of the first top-N rows only, as in
http://www.cplusplus.com/reference/algorithm/partial_sort/?

I'm interested in finding out for sure. Perhaps that's highjacking this
thread a bit, but in case of SQLite doing #1, and MySQL doing #2, it could
explain some of the difference. (although sorting a 1M array is so fast
nowadays, I doubt it.).


there must be another thing wrong with his setup. (He might not see my
> replies because he uses gmail).
>

Sure. I don't dispute that.


> In fact, the entire script, making the table, adding the index, populating
> it with a million rows (with computed values no less) and then doing the
> query plus posting the output - ALL of it takes less than 4 seconds
> together: (Try it yourself)
>

His rows are "fatter", since he mentioned 41 columns. Which might make it
go over some threshold(s) (page cache?) slowing things down once past it.

But indeed, sharing the DB (if not sensitive data) would be the way to go.
--DD


[sqlite] Performance issue

2015-04-08 Thread R.Smith


On 2015-04-08 11:35 AM, Dominique Devienne wrote:
> On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux  wrote:
>
>> # For mysql, I use:
>> /etc/init.d/mysql stop; /etc/init.d/mysql start; \
>> time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
>>   as item FROM flows WHERE timestamp>=1383770600 AND \
>>   timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
>>   ORDER BY vol DESC LIMIT 6;' |
>>   mysql testperf
>>
> If you dropped the LIMIT 6 from both, are the results significantly
> different?
>
> I.e. does actually getting the full result set result in closer numbers
> between SQLite and MySQL?
>
> I'm not sure SQLite's implementation of LIMIT prevents the whole result-set
> from being sorted before being truncated to 6 rows. Avoiding LIMIT would be
> one round-about way to find out. --DD

No Dominique, it's not that - there must be another thing wrong with his 
setup. (He might not see my replies because he uses gmail).

We will need an actual DB file from the OP with the 1 mil records to 
test and understand the problem because there is something that isn't 
clear or not being said (for deemed unimportant probably).

Again, here is a script that makes a similar table, populate it with 1 
million rows, adds the correct index and then do the aggregate query on 
all of them. It takes less than 500 milliseconds - faster than MySQL 
sans caching. (Obviously my pragmas/compile options may make a 
difference). Even if there are much more data in the table per row, it 
can't cause an order of magnitude increase in time - and it doesn't seem 
to be the case from the schema the OP posted.

In fact, the entire script, making the table, adding the index, 
populating it with a million rows (with computed values no less) and 
then doing the query plus posting the output - ALL of it takes less than 
4 seconds together:
(Try it yourself)

Important to note here is that the query I adapted to actually include 
the entire dataset (all 1 mil rows) so the Index is of little value. It 
only matters where the WHERE clause refers less rows - in which case the 
time decreases linearly.

   -- 


DROP TABLE IF EXISTS flows;
CREATE TABLE flows(
   id INTEGER PRIMARY KEY,
   ipLan TEXT,
   ipWan TEXT,
   portLan INT,
   portWan INT,
   protocol INT,
   nbBytesDecompOut INT,
   nbBytesCompIn INT,
   tos INT,
   timestamp INT
);

WITH acc(x,mx8,dx8,mxK,dxK,rK) AS (
 SELECT 0,0,0,0,0,100
   UNION ALL
 SELECT x+1, (x%8), CAST(round(x/8,0) AS INT), (x%1024), 
CAST(round(x/1024,0) AS INT),
CAST(abs(round(random()/10240,0)) AS INT)
   FROM acc
  WHERE (x<100)  -- Testing 1 million rows
)
INSERT INTO flows SELECT
   x, '192.168.1.'||mx8, '55.66.77.'||(dx8%256),  -- ipLan, ipWan
   1000+mx8, 5000+mx8, (x%18),-- portlan, portWan, 
protocol
   64+(rk%1024000),   -- nbBytesDecompOut
   1024+(rk%1024000), -- nbBytesDecompIn
   (dx8%3), (138000+x)-- tos, timestamp
   FROM acc;

CREATE INDEX idxFlowTimeProt ON flows (timestamp, protocol);
SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item, 
count(*) as 'No.'
   FROM  flows
  WHERE timestamp>=1373770600 AND timestamp<=1484770600 AND protocol IN 
(17, 6)
  GROUP BY portLan ORDER BY vol DESC
  LIMIT 6;

   -- volitem
   -- ---
   -- 140205511341001
   -- 139645375081000
   -- 139401283301003
   -- 139381342321002
   -- 139308617041005
   -- 139138037521006

   --Item Stats:  Item No:   5 Query Size 
(Chars):  232
   -- Result Columns:3 Result Rows: 6
   -- VM Work Steps: 8833543   Rows 
Modified:   0
   -- Sort Operations:   2 Table-Scan Steps:0
   -- Prepare Time:  -- --- --- --- --.
   -- Query Run: 0d 00h 00m and 00.408s
   -- Full Query Time:   0d 00h 00m and 00.408s
   -- Query Result:  Success.
   -- 


   --   Script Stats: Total Script Execution Time: 0d 00h 00m and 
03.742s
   -- Total Script Query Time: 0d 00h 00m and 
03.620s
   -- Total Database Rows Changed: 101
   -- Total Virtual-Machine Steps: 76833701
   -- Last executed Item Index:5
   -- Last Script Error:
   -- 








[sqlite] Performance issue

2015-04-08 Thread Simon Slavin

On 8 Apr 2015, at 10:16am, Jeff Roux  wrote:

> SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
> as item FROM flows WHERE timestamp>=1383770600 AND \
> timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
> ORDER BY vol DESC LIMIT 6

If you want us to understand what's happening please run ANALYZE then give us 
the output of

EXPLAIN QUERY PLAN 

Simon.


[sqlite] Performance issue

2015-04-08 Thread Dominique Devienne
On Wed, Apr 8, 2015 at 11:16 AM, Jeff Roux  wrote:

> # For mysql, I use:
> /etc/init.d/mysql stop; /etc/init.d/mysql start; \
> time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
>  as item FROM flows WHERE timestamp>=1383770600 AND \
>  timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
>  ORDER BY vol DESC LIMIT 6;' |
>  mysql testperf
>

If you dropped the LIMIT 6 from both, are the results significantly
different?

I.e. does actually getting the full result set result in closer numbers
between SQLite and MySQL?

I'm not sure SQLite's implementation of LIMIT prevents the whole result-set
from being sorted before being truncated to 6 rows. Avoiding LIMIT would be
one round-about way to find out. --DD


[sqlite] Performance issue

2015-04-08 Thread Jeff Roux
Thanks everyone for your answers,

I made some changes to the database according to the information you
gave me. It improved the performance of the query by about 20% (the
request now takes 4 seconds instead of 5).

Here are some more information, regarding all the suggestions I
received:

  - The version of SQLite I used is the one provided by Debian (current
stable: wheezy).

  - I need the database to be opened in Read Only mode, so I did not
use the WAL mode.

  - All the existing indexes cannot be removed because they are used
by other queries.

  - I however removed unused indexes for the following tests results

Note that my benchmarks are run in batch, with sqlite3 as with
mysql. I stop and start the mysql daemon to avoid most caching (I
hope).

# For mysql, I use:
/etc/init.d/mysql stop; /etc/init.d/mysql start; \
time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan \
 as item FROM flows WHERE timestamp>=1383770600 AND \
 timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan \
 ORDER BY vol DESC LIMIT 6;' |
 mysql testperf

2783898050  33722
1374153827  33736
886842830   39155
655809252   51800
363040479   53153
358988337   59757

real0m1.067s
user0m0.000s
sys 0m0.000s


# For sqlite, I use:
time echo 'SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as vol, portLan
 as item FROM flows WHERE timestamp>=1383770600 AND
 timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan
 ORDER BY vol DESC LIMIT 6;' |
 sqlite3 /var/db/udcast/flow_stats_OA_1M.db

2783898050|33722
1374153827|33736
886842830|39155
655809252|51800
363040479|53153
358988337|59757

real0m4.405s
user0m1.812s
sys 0m2.580s

Here is the time spent in the query according to the number of lines
matching the where clause (ANALYZE has been run before):

PERIOD (s)  MIN TS   MAX TS   LINESTIME
---
36001384767000   1384770600   351130:00.06
---
72001384763400   1384770600   676110:00.11
---
21600   1384749000   1384770600   154592   0:00.69
---
43200   1384727400   1384770600   270728   0:01.18
---
86400   1384684200   1384770600   501871   0:02.20
---
all 1383770600   1384770600   100  0:04.44


The 20% improvement is nice, but mysql (even without caching) is still
far ahead for the moment (4 times faster). Other ideas are welcome.

Thanks again!


2015-04-01 12:52 GMT+02:00 GB :

> In case of SELECTing "all available" I recommend invoking a different
> statement without the timestamp-part instead of providing some min and max
> values for timestamp. This avoids tricking the query planner into some
> wrong decisions (if you have an index with protocol as the first column).
>
> And how about WAL mode? If concurrency is of any concern for you, this
> definitely is something worth a try.
>
> -- GB
>
>
> Jeff Roux schrieb am 31.03.2015 um 12:48:
>
>> Thanks everyone for the answers.
>>
>> I won't be able to make some tests today, I will come back to you soon
>> with
>> additional information. Just to say that, in the worst case, the WHERE
>> clause selects the entire data, i.e 100 entries. The user can select a
>> time range of 1 hour, 3 hours, 1 day, and ? all available ?. Note: before
>> being added in the database, the time stamps are aggregated on a 180
>> second
>> period and a lot of rows has the same time stamp (~ one row per TCP
>> session).
>>
>> All the columns are defined as INTEGER. There are 41 columns in total in
>> the flow table. If I remember well, there is no primary key defined for
>> this table.
>>
>> 2015-03-31 8:32 GMT+02:00 GB :
>>
>>   From what I see, I assume that timestamp gives the highest selectivity.
>>> Taking into account that protocol is SELECTed for and portLan is GROUPed
>>> BY, I'd try an index (timestamp, protocol, portLan) (not sure if portLan
>>> helps here, but it's worth a try, I think). Don't forget to ANALYZE, of
>>> course. Are your colums of INTEGER affinity? If the are of TEXT, they
>>> will
>>> store anything as TEXT. May make a difference in both space consumption
>>> and
>>> speed. Is your SQLite lib built with SQLITE_ENABLE_STAT4 enabled? If not,
>>> give it a try. It sometimes makes a big difference.
>>>
>>> Is it possible that data collection and retrieval happen at the same
>>> time?
>>> If so, try running the database in WAL mode, it should help with
>>> concurrency issues.
>>>
>>> -- GB
>>>
>>>
>>> Jeff Roux schrieb am 30.03.2015 um 11:46:
>>>
>>>  Hi everyone,

 I have a daemon that collects information and stores it in a SQLite
 database. The table has 1 million rows.

 This daemon is 

[sqlite] Performance issue

2015-04-08 Thread Richard Hipp
On 4/8/15, Dominique Devienne  wrote:
>  With a LIMIT clause, in
> such a GROUP BY ORDER BY returning a large result set, would SQLite:
> 1) sort the whole result-set and then keep only the first top-N rows?
> 2) or instead do a partial-sort of the first top-N rows only,

SQLite must examine all rows of output, obviously.  But it only keeps
the top-N in memory and only sorts the top-N.  If there are a total of
M candidate rows and only the top-N are to be displayed, then the
algorithm is O(M*logN) in time and O(N) in space.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Performance issue

2015-04-01 Thread GB
In case of SELECTing "all available" I recommend invoking a different 
statement without the timestamp-part instead of providing some min and 
max values for timestamp. This avoids tricking the query planner into 
some wrong decisions (if you have an index with protocol as the first 
column).

And how about WAL mode? If concurrency is of any concern for you, this 
definitely is something worth a try.

-- GB

Jeff Roux schrieb am 31.03.2015 um 12:48:
> Thanks everyone for the answers.
>
> I won't be able to make some tests today, I will come back to you soon with
> additional information. Just to say that, in the worst case, the WHERE
> clause selects the entire data, i.e 100 entries. The user can select a
> time range of 1 hour, 3 hours, 1 day, and ? all available ?. Note: before
> being added in the database, the time stamps are aggregated on a 180 second
> period and a lot of rows has the same time stamp (~ one row per TCP
> session).
>
> All the columns are defined as INTEGER. There are 41 columns in total in
> the flow table. If I remember well, there is no primary key defined for
> this table.
>
> 2015-03-31 8:32 GMT+02:00 GB :
>
>>  From what I see, I assume that timestamp gives the highest selectivity.
>> Taking into account that protocol is SELECTed for and portLan is GROUPed
>> BY, I'd try an index (timestamp, protocol, portLan) (not sure if portLan
>> helps here, but it's worth a try, I think). Don't forget to ANALYZE, of
>> course. Are your colums of INTEGER affinity? If the are of TEXT, they will
>> store anything as TEXT. May make a difference in both space consumption and
>> speed. Is your SQLite lib built with SQLITE_ENABLE_STAT4 enabled? If not,
>> give it a try. It sometimes makes a big difference.
>>
>> Is it possible that data collection and retrieval happen at the same time?
>> If so, try running the database in WAL mode, it should help with
>> concurrency issues.
>>
>> -- GB
>>
>>
>> Jeff Roux schrieb am 30.03.2015 um 11:46:
>>
>>> Hi everyone,
>>>
>>> I have a daemon that collects information and stores it in a SQLite
>>> database. The table has 1 million rows.
>>>
>>> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
>>> and a SSD drive. I have performance issues with some requests. For
>>> instance, the following request takes more than 5 seconds to
>>> accomplish with SQlite3 (in that particular case, the WHERE clause
>>> selects all the data in the database, i.e. 100 rows):
>>>
>>> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
>>> FROM  flows
>>> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
>>> (17, 6)
>>> GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>>>
>>> I have done some tests with or without "INDEXED BY" clauses and got
>>> nearly the same results.
>>>
>>> I compared the performance with a mySQL and the same request takes
>>> less than 1 second to accomplish.
>>>
>>> Could you give me some directions to optimize this kind of request
>>> with SQlite3 when there is a big amount of data in the table ? I
>>> need to increase 3 times the number of rows and the performance will
>>> become unacceptable for my application.
>>>
>>> Thanks in advance.
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users at mailinglists.sqlite.org
>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>>
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Performance issue

2015-04-01 Thread R.Smith


On 2015-04-01 10:20 AM, Jeff Roux wrote:
> Here is the complete schema:
>
> sqlite> .schema
> CREATE TABLE application (id INTEGER CONSTRAINT applpk PRIMARY KEY,
> shortname VARCHAR(64), name VARCHAR(256));
>
> CREATE TABLE flows (idx INTEGER PRIMARY KEY, ipLan INTEGER, ipWan INTEGER,
> flags INTEGER, portLan INTEGER, portWan INTEGER, tos INTEGER, protocol
> INTEGER, wanTcpFlags INTEGER, nbBytesDecompOut INTEGER, nbBytesCompIn
> INTEGER, duration INTEGER, applId INTEGER, CST INTEGER, SRT INTEGER, RTT
> INTEGER, lan_retransmission INTEGER, wan_retransmission INTEGER,
> nbPktDecompIn INTEGER, nbPktCompOut INTEGER, nbBytesDecompIn INTEGER,
> nbBytesCompOut INTEGER, timestamp INTEGER, rtpTypeLAN INTEGER,
> rtpPktsLossLAN INTEGER, rtpJitterLAN INTEGER, rtpFactorRLAN INTEGER,
> rtpTypeWAN INTEGER, rtpPktsLossWAN INTEGER, rtpJitterWAN INTEGER,
> rtpFactorWLAN INTEGER, nbBytesDbcDecompOut INTEGER, nbBytesDbcCompIn
> INTEGER, nbBytesDefDecompOut INTEGER, nbBytesDefCompIn INTEGER,
> nbPktDecompOut INTEGER, nbPktCompIn INTEGER, nbBytesDbcDecompIn INTEGER,
> nbBytesDbcCompOut INTEGER, nbBytesDefDecompIn INTEGER, nbBytesDefCompOut
> INTEGER);
>
> CREATE INDEX idxApp ON flows(applId);
> CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan);
> CREATE INDEX idxProto ON flows(protocol);
> CREATE INDEX idxTos ON flows(tos);
> CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan,
> portLan);
> CREATE INDEX tsindex ON flows(timestamp);

You may remove all those Indices (unless you need them for something 
else) and jfor the specific query ust use:

CREATE INDEX tsIdxProt ON flows(timestamp,protocol);

Which, as done in the test scripts for 3 million rows I've sent earlier, 
will produce very fast results for your query (Sub 50 milliseconds for 
12K rows).
The actual time will greatly depend on the amount of hits qualified by 
the WHERE clause, but the times will be comparable to MySQL or any other 
SQL service (uncached).




[sqlite] Performance issue

2015-04-01 Thread Jeff Roux
Here is the complete schema:

sqlite> .schema
CREATE TABLE application (id INTEGER CONSTRAINT applpk PRIMARY KEY,
shortname VARCHAR(64), name VARCHAR(256));

CREATE TABLE flows (idx INTEGER PRIMARY KEY, ipLan INTEGER, ipWan INTEGER,
flags INTEGER, portLan INTEGER, portWan INTEGER, tos INTEGER, protocol
INTEGER, wanTcpFlags INTEGER, nbBytesDecompOut INTEGER, nbBytesCompIn
INTEGER, duration INTEGER, applId INTEGER, CST INTEGER, SRT INTEGER, RTT
INTEGER, lan_retransmission INTEGER, wan_retransmission INTEGER,
nbPktDecompIn INTEGER, nbPktCompOut INTEGER, nbBytesDecompIn INTEGER,
nbBytesCompOut INTEGER, timestamp INTEGER, rtpTypeLAN INTEGER,
rtpPktsLossLAN INTEGER, rtpJitterLAN INTEGER, rtpFactorRLAN INTEGER,
rtpTypeWAN INTEGER, rtpPktsLossWAN INTEGER, rtpJitterWAN INTEGER,
rtpFactorWLAN INTEGER, nbBytesDbcDecompOut INTEGER, nbBytesDbcCompIn
INTEGER, nbBytesDefDecompOut INTEGER, nbBytesDefCompIn INTEGER,
nbPktDecompOut INTEGER, nbPktCompIn INTEGER, nbBytesDbcDecompIn INTEGER,
nbBytesDbcCompOut INTEGER, nbBytesDefDecompIn INTEGER, nbBytesDefCompOut
INTEGER);

CREATE INDEX idxApp ON flows(applId);
CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan);
CREATE INDEX idxProto ON flows(protocol);
CREATE INDEX idxTos ON flows(tos);
CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan,
portLan);
CREATE INDEX tsindex ON flows(timestamp);


2015-03-30 12:44 GMT+02:00 Jeff Roux :

> Simon,
>
> here is the list of the indexes that were already defined on the table:
> CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan);
> CREATE INDEX idxProto ON flows(protocol);
> CREATE INDEX idxTos ON flows(tos);
> CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan,
> portLan);
> CREATE INDEX tsindex ON flows(timestamp);
>
> I added the ones you gave me:
> CREATE INDEX ts1 ON flows (portLan, protocol, timestamp);
> CREATE INDEX ts2 ON flows (portLan, timestamp);
> CREATE INDEX ts3 ON flows (protocol, portLan, timestamp);
> CREATE INDEX ts4 ON flows (protocol, timestamp);
>
>  and it appears it now uses the ts4 index, but the time spent is still ~ 5s
>
>
> sqlite> EXPLAIN QUERY PLAN SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as
> vol, portLan as item FROM  flows  WHERE timestamp>=1383770600 AND
> timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol
> DESC LIMIT 6;
> 0|0|0|SEARCH TABLE flows USING INDEX ts4 (protocol=? AND timestamp>? AND
> timestamp 0|0|0|EXECUTE LIST SUBQUERY 1
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
>
> 2015-03-30 12:20 GMT+02:00 Simon Slavin :
>
>>
>> On 30 Mar 2015, at 10:46am, Jeff Roux  wrote:
>>
>> > This daemon is running on a HP server with 12 cores, 32 GB of RAM,
>> > and a SSD drive. I have performance issues with some requests. For
>> > instance, the following request takes more than 5 seconds to
>> > accomplish with SQlite3 (in that particular case, the WHERE clause
>> > selects all the data in the database, i.e. 100 rows):
>> >
>> > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
>> > FROM  flows
>> > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
>> (17, 6)
>> > GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>>
>> What indexes do you have on that table ?  I would expect to get best
>> results from one of the following indexes
>>
>> (portLan, protocol, timestamp)
>> (portLan, timestamp)
>> (protocol, portLan, timestamp)
>> (protocol, timestamp)
>>
>> For best results, CREATE all of those indexes and any others that strike
>> your fancy, run ANALYZE, then execute your SELECT.  If this gives
>> satisfactory results, use
>>
>> EXPLAIN QUERY PLAN [your SELECT statement]
>>
>> and see which index SQLite decided to use.  You can then delete the other
>> unused indexes and run ANALYZE one final time for a little more
>> optimization.
>>
>> It's worth noting that SQLite, since it has no server, has no
>> server-level caching.  If you're doing repeated queries using mySQL, after
>> the first one most of the index is cached so the times you're seeing are
>> faster than SQLite can ever produce.  Two disadvantages of mySQL are the
>> memory usage and the work involved in administration.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>


[sqlite] Performance issue

2015-03-31 Thread Jeff Roux
Thanks everyone for the answers.

I won't be able to make some tests today, I will come back to you soon with
additional information. Just to say that, in the worst case, the WHERE
clause selects the entire data, i.e 100 entries. The user can select a
time range of 1 hour, 3 hours, 1 day, and ? all available ?. Note: before
being added in the database, the time stamps are aggregated on a 180 second
period and a lot of rows has the same time stamp (~ one row per TCP
session).

All the columns are defined as INTEGER. There are 41 columns in total in
the flow table. If I remember well, there is no primary key defined for
this table.

2015-03-31 8:32 GMT+02:00 GB :

> From what I see, I assume that timestamp gives the highest selectivity.
> Taking into account that protocol is SELECTed for and portLan is GROUPed
> BY, I'd try an index (timestamp, protocol, portLan) (not sure if portLan
> helps here, but it's worth a try, I think). Don't forget to ANALYZE, of
> course. Are your colums of INTEGER affinity? If the are of TEXT, they will
> store anything as TEXT. May make a difference in both space consumption and
> speed. Is your SQLite lib built with SQLITE_ENABLE_STAT4 enabled? If not,
> give it a try. It sometimes makes a big difference.
>
> Is it possible that data collection and retrieval happen at the same time?
> If so, try running the database in WAL mode, it should help with
> concurrency issues.
>
> -- GB
>
>
> Jeff Roux schrieb am 30.03.2015 um 11:46:
>
>> Hi everyone,
>>
>> I have a daemon that collects information and stores it in a SQLite
>> database. The table has 1 million rows.
>>
>> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
>> and a SSD drive. I have performance issues with some requests. For
>> instance, the following request takes more than 5 seconds to
>> accomplish with SQlite3 (in that particular case, the WHERE clause
>> selects all the data in the database, i.e. 100 rows):
>>
>> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
>> FROM  flows
>> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
>> (17, 6)
>> GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>>
>> I have done some tests with or without "INDEXED BY" clauses and got
>> nearly the same results.
>>
>> I compared the performance with a mySQL and the same request takes
>> less than 1 second to accomplish.
>>
>> Could you give me some directions to optimize this kind of request
>> with SQlite3 when there is a big amount of data in the table ? I
>> need to increase 3 times the number of rows and the performance will
>> become unacceptable for my application.
>>
>> Thanks in advance.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Performance issue

2015-03-31 Thread GB
 From what I see, I assume that timestamp gives the highest selectivity. 
Taking into account that protocol is SELECTed for and portLan is GROUPed 
BY, I'd try an index (timestamp, protocol, portLan) (not sure if portLan 
helps here, but it's worth a try, I think). Don't forget to ANALYZE, of 
course. Are your colums of INTEGER affinity? If the are of TEXT, they 
will store anything as TEXT. May make a difference in both space 
consumption and speed. Is your SQLite lib built with SQLITE_ENABLE_STAT4 
enabled? If not, give it a try. It sometimes makes a big difference.

Is it possible that data collection and retrieval happen at the same 
time? If so, try running the database in WAL mode, it should help with 
concurrency issues.

-- GB

Jeff Roux schrieb am 30.03.2015 um 11:46:
> Hi everyone,
>
> I have a daemon that collects information and stores it in a SQLite
> database. The table has 1 million rows.
>
> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> and a SSD drive. I have performance issues with some requests. For
> instance, the following request takes more than 5 seconds to
> accomplish with SQlite3 (in that particular case, the WHERE clause
> selects all the data in the database, i.e. 100 rows):
>
> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> FROM  flows
> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6)
> GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>
> I have done some tests with or without "INDEXED BY" clauses and got
> nearly the same results.
>
> I compared the performance with a mySQL and the same request takes
> less than 1 second to accomplish.
>
> Could you give me some directions to optimize this kind of request
> with SQlite3 when there is a big amount of data in the table ? I
> need to increase 3 times the number of rows and the performance will
> become unacceptable for my application.
>
> Thanks in advance.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



[sqlite] Performance issue

2015-03-30 Thread R.Smith


On 2015-03-30 11:46 AM, Jeff Roux wrote:
> Hi everyone,
>
> I have a daemon that collects information and stores it in a SQLite
> database. The table has 1 million rows.
>
> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> and a SSD drive. I have performance issues with some requests. For
> instance, the following request takes more than 5 seconds to
> accomplish with SQlite3 (in that particular case, the WHERE clause
> selects all the data in the database, i.e. 100 rows):
>
> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> FROM  flows
> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6)
> GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>
> I have done some tests with or without "INDEXED BY" clauses and got
> nearly the same results.
>
> I compared the performance with a mySQL and the same request takes
> less than 1 second to accomplish.
>
> Could you give me some directions to optimize this kind of request
> with SQlite3 when there is a big amount of data in the table ? I
> need to increase 3 times the number of rows and the performance will
> become unacceptable for my application.

I think something else is wrong, we might need your Schema to better 
judge. The reason I say something else is wrong, is that I made this 
script quick to reproduce what I could establish from your writings 
might be what the schema looks like, and then added a single index and 
tried the query.

I deliberately added a primary key and then did not use it at all to 
avoid those advantages as it may not suit your needs.

You will see I used 3 million rows in stead of 1 million (because you 
said your data may grow three times bigger) and some random values in 
the appropriate ranges to fake the data (where I had an idea, and 
guessed the rest). The query takes less than 50 milliseconds, even over 
3 million rows. In fact the entire script creating the DB, inserting 3 
million rows, creating an index, and then doing 3 queries all took only 
18 seconds in total (without running analyze).  This will even go a bit 
faster if I turn stats off, but I left it in for clarity.

If this does not enlighten your quest, kindly provide your full schema 
and example data shapes which might give us a clue.

NOTE: In my data it works out that about ~12K rows satisfies the where 
clause - if this is significantly more in your case it might change the 
timing a bit, but nothing like your suggested times.
See results below:

   -- Processing Script for File: D:\Documents\TestScript3.txt
   -- Script Items: 7  Parameter Count: 0

   -- 2015-03-30 16:16:11.795  |  [Success]Script Started...
   -- 


DROP TABLE IF EXISTS flows;

   --Item Stats:  Item No:   1 Query Size 
(Chars):  29
   -- VM Work Steps: 92Rows 
Modified:   0
   -- Full Query Time:   0d 00h 00m and 00.485s
   -- Query Result:  Success.
   -- 


CREATE TABLE flows(
   id INTEGER PRIMARY KEY,
   ipLan TEXT,
   ipWan TEXT,
   portLan INT,
   portWan INT,
   protocol INT,
   nbBytesDecompOut INT,
   nbBytesCompIn INT,
   tos INT,
   timestamp INT
);

   --Item Stats:  Item No:   2 Query Size 
(Chars):  207
   -- VM Work Steps: 31Rows 
Modified:   0
   -- Full Query Time:   -- --- --- --- --.
   -- Query Result:  Success.
   -- 


/* POPULATE the table with random + scaled data to emulate an actual 
dataset of 3M rows */

WITH acc(x,mx8,dx8,mxK,dxK,rK) AS (
 SELECT 0,0,0,0,0,100
   UNION ALL
 SELECT x+1, (x%8), CAST(round(x/8,0) AS INT), (x%1024), 
CAST(round(x/1024,0) AS INT),
CAST(abs(round(random()/10240,0)) AS INT)
   FROM acc
  WHERE (x<300)  -- Testing 3 million rows
)
INSERT INTO flows SELECT
   x, '192.168.1.'||mx8, '55.66.77.'||(dx8%256),  -- ipLan, ipWan
   1000+mx8, 5000+mx8, (x%18),-- portlan, portWan, 
protocol
   64+(rk%1024000),   -- nbBytesDecompOut
   1024+(rk%1024000), -- nbBytesDecompIn
   (dx8%3), (138000+(x*10))   -- tos, timestamp
   FROM acc;


   --Item Stats:  Item No:   3 Query Size 
(Chars):  691
   -- Result Columns:0 Result 
Rows: 0
   -- VM Work Steps: 18087  Rows 
Modified:   301
   -- Full Query Time:   0d 00h 00m and 12.849s
   -- Query Result:  Success.
   -- 

[sqlite] Performance issue

2015-03-30 Thread Jeff Roux
Simon,

here is the list of the indexes that were already defined on the table:
CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan);
CREATE INDEX idxProto ON flows(protocol);
CREATE INDEX idxTos ON flows(tos);
CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan,
portLan);
CREATE INDEX tsindex ON flows(timestamp);

I added the ones you gave me:
CREATE INDEX ts1 ON flows (portLan, protocol, timestamp);
CREATE INDEX ts2 ON flows (portLan, timestamp);
CREATE INDEX ts3 ON flows (protocol, portLan, timestamp);
CREATE INDEX ts4 ON flows (protocol, timestamp);

 and it appears it now uses the ts4 index, but the time spent is still ~ 5s


sqlite> EXPLAIN QUERY PLAN SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as
vol, portLan as item FROM  flows  WHERE timestamp>=1383770600 AND
timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol
DESC LIMIT 6;
0|0|0|SEARCH TABLE flows USING INDEX ts4 (protocol=? AND timestamp>? AND
timestamp:

>
> On 30 Mar 2015, at 10:46am, Jeff Roux  wrote:
>
> > This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> > and a SSD drive. I have performance issues with some requests. For
> > instance, the following request takes more than 5 seconds to
> > accomplish with SQlite3 (in that particular case, the WHERE clause
> > selects all the data in the database, i.e. 100 rows):
> >
> > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> > FROM  flows
> > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
> (17, 6)
> > GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>
> What indexes do you have on that table ?  I would expect to get best
> results from one of the following indexes
>
> (portLan, protocol, timestamp)
> (portLan, timestamp)
> (protocol, portLan, timestamp)
> (protocol, timestamp)
>
> For best results, CREATE all of those indexes and any others that strike
> your fancy, run ANALYZE, then execute your SELECT.  If this gives
> satisfactory results, use
>
> EXPLAIN QUERY PLAN [your SELECT statement]
>
> and see which index SQLite decided to use.  You can then delete the other
> unused indexes and run ANALYZE one final time for a little more
> optimization.
>
> It's worth noting that SQLite, since it has no server, has no server-level
> caching.  If you're doing repeated queries using mySQL, after the first one
> most of the index is cached so the times you're seeing are faster than
> SQLite can ever produce.  Two disadvantages of mySQL are the memory usage
> and the work involved in administration.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Performance issue

2015-03-30 Thread Jeff Roux
Hi everyone,

I have a daemon that collects information and stores it in a SQLite
database. The table has 1 million rows.

This daemon is running on a HP server with 12 cores, 32 GB of RAM,
and a SSD drive. I have performance issues with some requests. For
instance, the following request takes more than 5 seconds to
accomplish with SQlite3 (in that particular case, the WHERE clause
selects all the data in the database, i.e. 100 rows):

SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
FROM  flows
WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6)
GROUP BY portLan ORDER BY vol DESC LIMIT 6;

I have done some tests with or without "INDEXED BY" clauses and got
nearly the same results.

I compared the performance with a mySQL and the same request takes
less than 1 second to accomplish.

Could you give me some directions to optimize this kind of request
with SQlite3 when there is a big amount of data in the table ? I
need to increase 3 times the number of rows and the performance will
become unacceptable for my application.

Thanks in advance.


[sqlite] Performance issue

2015-03-30 Thread Simon Slavin

On 30 Mar 2015, at 10:46am, Jeff Roux  wrote:

> This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> and a SSD drive. I have performance issues with some requests. For
> instance, the following request takes more than 5 seconds to
> accomplish with SQlite3 (in that particular case, the WHERE clause
> selects all the data in the database, i.e. 100 rows):
> 
> SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> FROM  flows
> WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN (17, 6)
> GROUP BY portLan ORDER BY vol DESC LIMIT 6;

What indexes do you have on that table ?  I would expect to get best results 
from one of the following indexes

(portLan, protocol, timestamp)
(portLan, timestamp)
(protocol, portLan, timestamp)
(protocol, timestamp)

For best results, CREATE all of those indexes and any others that strike your 
fancy, run ANALYZE, then execute your SELECT.  If this gives satisfactory 
results, use

EXPLAIN QUERY PLAN [your SELECT statement]

and see which index SQLite decided to use.  You can then delete the other 
unused indexes and run ANALYZE one final time for a little more optimization.

It's worth noting that SQLite, since it has no server, has no server-level 
caching.  If you're doing repeated queries using mySQL, after the first one 
most of the index is cached so the times you're seeing are faster than SQLite 
can ever produce.  Two disadvantages of mySQL are the memory usage and the work 
involved in administration.

Simon.


[sqlite] Performance issue

2015-03-30 Thread Richard Hipp
On Mon, Mar 30, 2015 at 6:44 AM, Jeff Roux  wrote:

> Simon,
>
> here is the list of the indexes that were already defined on the table:
> CREATE INDEX idxLanWan ON flows(ipLan, ipWan, portWan, portLan);
> CREATE INDEX idxProto ON flows(protocol);
> CREATE INDEX idxTos ON flows(tos);
> CREATE INDEX idxTsLanWan ON flows(timestamp, ipLan, ipWan, portWan,
> portLan);
> CREATE INDEX tsindex ON flows(timestamp);
>

Recommend that you drop tsindex.  idxTsLanWan will work just as well.


>
> I added the ones you gave me:
> CREATE INDEX ts1 ON flows (portLan, protocol, timestamp);
> CREATE INDEX ts2 ON flows (portLan, timestamp);
> CREATE INDEX ts3 ON flows (protocol, portLan, timestamp);
> CREATE INDEX ts4 ON flows (protocol, timestamp);
>

If you decide to keep ts3 and/or ts4, then recommend that you drop idxProto.


>
>  and it appears it now uses the ts4 index, but the time spent is still ~ 5s
>

How many rows satisfy the WHERE clause?

You might get some relief using a covering index:

CREATE INDEX ts5 ON
flows(protocol,timestamp,portLan,nbBytesDecompOut,nbBytesCompln);


>
>
> sqlite> EXPLAIN QUERY PLAN SELECT SUM(nbBytesDecompOut+nbBytesCompIn) as
> vol, portLan as item FROM  flows  WHERE timestamp>=1383770600 AND
> timestamp<=1384770600 AND protocol IN (17, 6) GROUP BY portLan ORDER BY vol
> DESC LIMIT 6;
> 0|0|0|SEARCH TABLE flows USING INDEX ts4 (protocol=? AND timestamp>? AND
> timestamp 0|0|0|EXECUTE LIST SUBQUERY 1
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
>
> 2015-03-30 12:20 GMT+02:00 Simon Slavin :
>
> >
> > On 30 Mar 2015, at 10:46am, Jeff Roux  wrote:
> >
> > > This daemon is running on a HP server with 12 cores, 32 GB of RAM,
> > > and a SSD drive. I have performance issues with some requests. For
> > > instance, the following request takes more than 5 seconds to
> > > accomplish with SQlite3 (in that particular case, the WHERE clause
> > > selects all the data in the database, i.e. 100 rows):
> > >
> > > SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
> > > FROM  flows
> > > WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
> > (17, 6)
> > > GROUP BY portLan ORDER BY vol DESC LIMIT 6;
> >
> > What indexes do you have on that table ?  I would expect to get best
> > results from one of the following indexes
> >
> > (portLan, protocol, timestamp)
> > (portLan, timestamp)
> > (protocol, portLan, timestamp)
> > (protocol, timestamp)
> >
> > For best results, CREATE all of those indexes and any others that strike
> > your fancy, run ANALYZE, then execute your SELECT.  If this gives
> > satisfactory results, use
> >
> > EXPLAIN QUERY PLAN [your SELECT statement]
> >
> > and see which index SQLite decided to use.  You can then delete the other
> > unused indexes and run ANALYZE one final time for a little more
> > optimization.
> >
> > It's worth noting that SQLite, since it has no server, has no
> server-level
> > caching.  If you're doing repeated queries using mySQL, after the first
> one
> > most of the index is cached so the times you're seeing are faster than
> > SQLite can ever produce.  Two disadvantages of mySQL are the memory usage
> > and the work involved in administration.
> >
> > Simon.
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Performance issue

2015-03-30 Thread Keith Medcalf

What cache_size have you specified?

---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.

>-Original Message-
>From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
>bounces at mailinglists.sqlite.org] On Behalf Of Jeff Roux
>Sent: Monday, 30 March, 2015 03:46
>To: sqlite-users at mailinglists.sqlite.org
>Subject: [sqlite] Performance issue
>
>Hi everyone,
>
>I have a daemon that collects information and stores it in a SQLite
>database. The table has 1 million rows.
>
>This daemon is running on a HP server with 12 cores, 32 GB of RAM,
>and a SSD drive. I have performance issues with some requests. For
>instance, the following request takes more than 5 seconds to
>accomplish with SQlite3 (in that particular case, the WHERE clause
>selects all the data in the database, i.e. 100 rows):
>
>SELECT SUM(nbBytesDecompOut + nbBytesCompIn) as vol, portLan as item
>FROM  flows
>WHERE timestamp>=1383770600 AND timestamp<=1384770600 AND protocol IN
>(17, 6)
>GROUP BY portLan ORDER BY vol DESC LIMIT 6;
>
>I have done some tests with or without "INDEXED BY" clauses and got
>nearly the same results.
>
>I compared the performance with a mySQL and the same request takes
>less than 1 second to accomplish.
>
>Could you give me some directions to optimize this kind of request
>with SQlite3 when there is a big amount of data in the table ? I
>need to increase 3 times the number of rows and the performance will
>become unacceptable for my application.
>
>Thanks in advance.
>___
>sqlite-users mailing list
>sqlite-users at mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users





Re: [sqlite] Performance issue when copying data from one sqlite database to another

2014-10-15 Thread Pontus Bergsten
Thanks for your suggestions.
@Mikael I do believe that write caching was disabled on target. However, 
enabling lazy data write and data caching didn't make much of a difference.
@Ketil I specified the pragmas before copying by executing the sql statements 
"PRAGMA Dest.journal_mode = OFF" and "PRAGMA synchronous = OFF" on the database 
connection containing the "Main" and the attached "Dest" database. The queries 
seemed to be accepted by sqlite. However, it didn't make much of a difference.

@Richard The embedded operating system is RTOS-32 (including RTFiles-32 and 
RTUSB-32) from On time.
I find it strange that enabling the file data cache, and the setting of the 
synchronous pragma didn't affect performance. However, from the thread 
profiling, it can be seen that a "INSERT INTO ... FROM  ... WHERE"  statement 
usually have a couple of continuous segments with fairly good CPU utilization 
for about 10 to 50 ms, that is followed by long periods, 100 to 200 ms, 
containing only sporadic activity ~10 us triggered by the USB driver.
In the current implementation the data is transferred in chunks using separate 
"INSERT INTO ... FROM  ... WHERE" statements. The reason for this was to 
display some progress to the user. Next, I will try to transfer data data in 
larger chunks and see it makes any difference.
Any other ideas are most welcome!
Regards,
/Pontus
 



 Från: Richard Hipp <d...@sqlite.org>
 Till: Pontus Bergsten <pontus_bergs...@yahoo.se>; General Discussion of SQLite 
Database <sqlite-users@sqlite.org> 
 Skickat: onsdag, 15 oktober 2014 0:29
 Ämne: Re: [sqlite] Performance issue when copying data from one sqlite 
database to another
   




On Tue, Oct 14, 2014 at 5:24 PM, Pontus Bergsten <pontus_bergs...@yahoo.se> 
wrote:

 When the application is executed on Windows on a desktop computer, the copying 
works fine and the performance is fairly ok, even when saving to USB. However, 
when the same code is executed on the embedded system, the copying of data is 
extremely slow, even though the CPU load is very moderate.



That sounds like a file-system problem to me.  What is your embedded OS?

-- 
D. Richard Hipp
d...@sqlite.org

  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue when copying data from one sqlite database to another

2014-10-14 Thread Kees Nuyt
On Wed, 15 Oct 2014 01:26:10 +0200, Kees Nuyt 
wrote:

> http://knuyt.demon.nl/sqlite.org/faq.html#q19

Oops, make that 
http://www.sqlite.org/faq.html#q19

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue when copying data from one sqlite database to another

2014-10-14 Thread Kees Nuyt
On Tue, 14 Oct 2014 21:24:40 + (UTC), Pontus Bergsten
 wrote:

>Hi,
>I have the following problem setup:

[...]

> INSERT INTO Dest.TheTable (field1, ..., fieldN) FROM Main.TheTable WHERE time 
> BETWEEN t1 AND t2
>
> Is there any technique that can be used for tuning the performance of sqlite3 
> in this scenario?

I suspect this is a case of 
http://knuyt.demon.nl/sqlite.org/faq.html#q19


-- 
Regards,

Kees Nuyt


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue when copying data from one sqlite database to another

2014-10-14 Thread Richard Hipp
On Tue, Oct 14, 2014 at 5:24 PM, Pontus Bergsten 
wrote:

>  When the application is executed on Windows on a desktop computer, the
> copying works fine and the performance is fairly ok, even when saving to
> USB. However, when the same code is executed on the embedded system, the
> copying of data is extremely slow, even though the CPU load is very
> moderate.
>


That sounds like a file-system problem to me.  What is your embedded OS?

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue when copying data from one sqlite database to another

2014-10-14 Thread Simon Slavin

On 14 Oct 2014, at 10:24pm, Pontus Bergsten  wrote:

> INSERT INTO Dest.TheTable (field1, ..., fieldN) FROM Main.TheTable WHERE time 
> BETWEEN t1 AND t2
> And here is the performance problem: When the application is executed on 
> Windows on a desktop computer, the copying works fine and the performance is 
> fairly ok, even when saving to USB. However, when the same code is executed 
> on the embedded system, the copying of data is extremely slow, even though 
> the CPU load is very moderate. Profiling the thread that executes the 
> sql-statements above, reveals that the thread is active in many very small 
> steps, while waiting for the USB driver for very long time (compared to the 
> active time), in between. During profiling the copy-thread only did useful 
> work for about 5% of of the total time, the rest was waiting.
> 
> Is there any technique that can be used for tuning the performance of sqlite3 
> in this scenario? For example, writing larger chunks of data to the "Dest" 
> database?

Your description makes perfect sense, bearing in mind that cheap USB drives are 
slow.  A relatively cheap piece of research might be to see if you can find a 
(more expensive) fast USB drive and see whether that makes your operation 
faster.  Other than that, two possibilities occur to me:

(A) Create your new database file on main storage, and create the Dest table 
there.  Once it is complete, close the database, then copy the database file to 
the USB drive using file copy commands rather than SQLite commands.  That 
should give you the fastest possible way of getting that data onto the drive.

(B) Write your Dest table to memory, then use the SQLite backup API to copy 
that to a file on the USB drive.  Copying the entire database page by page 
should be faster than copying the data row by row.



I do not know that either of these will definitely help you.  It depends too 
much on the relative speed of various components of your embedded system and on 
the width of your data bottleneck.  But they might be worth exploring.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue when copying data from one sqlite database to another

2014-10-14 Thread Mikael
On the embedded unit write caching disabled?


2014-10-14 23:24 GMT+02:00 Pontus Bergsten :

> Hi,
> I have the following problem setup:
> We use sqlite3 in an embedded signal logger application. The "Main"
> in-memory signal signal database consists of some minor signal definition
> tables + two large tables (shards) with the actual signal data. The
> sharding technique is used in order to implement an efficient ringbuffer in
> sqlite.
>
> Now, from time to time in the application, it is desired to extract some
> signals in some specified time window from the "Main" database, and save
> the selected signals to another smaller "Dest" database on USB memory. The
> "Dest" database will have the same signal definition tables as "Main", but
> only one signal data table. No ringbuffer functionality, and hence no
> sharding, is needed for the "Dest" database.
> The actual copying is done by first creating the "Dest" database file with
> the required empty tables on USB, and then attach it to the "Main"
> in-memory database. Then the signal definitions and data is copied using a
> series of statements looking much like
>
> INSERT INTO Dest.TheTable (field1, ..., fieldN) FROM Main.TheTable WHERE
> time BETWEEN t1 AND t2
> And here is the performance problem: When the application is executed on
> Windows on a desktop computer, the copying works fine and the performance
> is fairly ok, even when saving to USB. However, when the same code is
> executed on the embedded system, the copying of data is extremely slow,
> even though the CPU load is very moderate. Profiling the thread that
> executes the sql-statements above, reveals that the thread is active in
> many very small steps, while waiting for the USB driver for very long time
> (compared to the active time), in between. During profiling the copy-thread
> only did useful work for about 5% of of the total time, the rest was
> waiting.
>
> Is there any technique that can be used for tuning the performance of
> sqlite3 in this scenario? For example, writing larger chunks of data to the
> "Dest" database?
>
> Regards,
> Pontus Bergsten
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue when copying data from one sqlite database to another

2014-10-14 Thread Ketil Froyn
Depends on how safe/robust you want the copying to be, but if you can
simply rerun if something goes wrong, you might look into stuff like:

pragma journal_mode = MEMORY;
pragma synchronous = off;

But make sure you understand the consequences first by reading about these
commands:

http://www.sqlite.org/pragma.html

Cheers, Ketil
On 14 Oct 2014 23:25, "Pontus Bergsten"  wrote:

> Hi,
> I have the following problem setup:
> We use sqlite3 in an embedded signal logger application. The "Main"
> in-memory signal signal database consists of some minor signal definition
> tables + two large tables (shards) with the actual signal data. The
> sharding technique is used in order to implement an efficient ringbuffer in
> sqlite.
>
> Now, from time to time in the application, it is desired to extract some
> signals in some specified time window from the "Main" database, and save
> the selected signals to another smaller "Dest" database on USB memory. The
> "Dest" database will have the same signal definition tables as "Main", but
> only one signal data table. No ringbuffer functionality, and hence no
> sharding, is needed for the "Dest" database.
> The actual copying is done by first creating the "Dest" database file with
> the required empty tables on USB, and then attach it to the "Main"
> in-memory database. Then the signal definitions and data is copied using a
> series of statements looking much like
>
> INSERT INTO Dest.TheTable (field1, ..., fieldN) FROM Main.TheTable WHERE
> time BETWEEN t1 AND t2
> And here is the performance problem: When the application is executed on
> Windows on a desktop computer, the copying works fine and the performance
> is fairly ok, even when saving to USB. However, when the same code is
> executed on the embedded system, the copying of data is extremely slow,
> even though the CPU load is very moderate. Profiling the thread that
> executes the sql-statements above, reveals that the thread is active in
> many very small steps, while waiting for the USB driver for very long time
> (compared to the active time), in between. During profiling the copy-thread
> only did useful work for about 5% of of the total time, the rest was
> waiting.
>
> Is there any technique that can be used for tuning the performance of
> sqlite3 in this scenario? For example, writing larger chunks of data to the
> "Dest" database?
>
> Regards,
> Pontus Bergsten
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue when copying data from one sqlite database to another

2014-10-14 Thread Pontus Bergsten
Hi,
I have the following problem setup:
We use sqlite3 in an embedded signal logger application. The "Main" in-memory 
signal signal database consists of some minor signal definition tables + two 
large tables (shards) with the actual signal data. The sharding technique is 
used in order to implement an efficient ringbuffer in sqlite.

Now, from time to time in the application, it is desired to extract some 
signals in some specified time window from the "Main" database, and save the 
selected signals to another smaller "Dest" database on USB memory. The "Dest" 
database will have the same signal definition tables as "Main", but only one 
signal data table. No ringbuffer functionality, and hence no sharding, is 
needed for the "Dest" database.
The actual copying is done by first creating the "Dest" database file with the 
required empty tables on USB, and then attach it to the "Main" in-memory 
database. Then the signal definitions and data is copied using a series of 
statements looking much like

INSERT INTO Dest.TheTable (field1, ..., fieldN) FROM Main.TheTable WHERE time 
BETWEEN t1 AND t2
And here is the performance problem: When the application is executed on 
Windows on a desktop computer, the copying works fine and the performance is 
fairly ok, even when saving to USB. However, when the same code is executed on 
the embedded system, the copying of data is extremely slow, even though the CPU 
load is very moderate. Profiling the thread that executes the sql-statements 
above, reveals that the thread is active in many very small steps, while 
waiting for the USB driver for very long time (compared to the active time), in 
between. During profiling the copy-thread only did useful work for about 5% of 
of the total time, the rest was waiting.

Is there any technique that can be used for tuning the performance of sqlite3 
in this scenario? For example, writing larger chunks of data to the "Dest" 
database?

Regards,
Pontus Bergsten

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue using SQLite

2014-06-22 Thread lyx
Sorry, the first sql statement in my previous mail is incorrect. 
It is not " sprintf (sql_str, "select ctrl_no from T_CTRL_CLRRULE where 
CTRL_NO='%s' and ((PRODUCT_ID='%s' and OPERATE_TYPE='%s') or 
(PRODUCT_ID='AA' and OPERATE_TYPE='%s') or (PRODUCT_ID='%s' and 
OPERATE_TYPE='AA')) and CTRL_FLAG='%s'", sCtrlNo, sProductId, sOperType, 
sOperType, sProductId, sCtrlFlag);"


It should be " sprintf (sql_str, "select count(*) from T_CTRL_CLRRULE where 
CTRL_NO='%s' and ((PRODUCT_ID='%s' and OPERATE_TYPE='%s') or 
(PRODUCT_ID='AA' and OPERATE_TYPE='%s') or (PRODUCT_ID='%s' and 
OPERATE_TYPE='AA')) and CTRL_FLAG='%s'", sCtrlNo, sProductId, sOperType, 
sOperType, sProductId, sCtrlFlag);"


Thanks for Clemens' correction.


Hi Clemens,
I will try your advice and feed back the test result tomorrow.






>Message: 5
>Date: Sat, 21 Jun 2014 21:53:58 +0800
>From: 163 <sdu...@163.com>
>To: "sqlite-users@sqlite.org" <sqlite-users@sqlite.org>
>Subject: [sqlite] Performance issue using SQLite
>Message-ID: <790e34bd-c627-4dd6-872e-2b358a6d1...@163.com>
>Content-Type: text/plain;  charset=us-ascii
>
>
>> Hi Experts,
>> I met a performance issue when using SQLite 3.8.4.3. I found it would be 
>> quite slow trying to select count(*) using a where statement with several OR 
>> condition. For example:
>>  
>> sprintf (sql_str, "select ctrl_no from T_CTRL_CLRRULE where CTRL_NO='%s' and 
>> ((PRODUCT_ID='%s' and OPERATE_TYPE='%s') or (PRODUCT_ID='AA' and 
>> OPERATE_TYPE='%s') or (PRODUCT_ID='%s' and OPERATE_TYPE='AA')) and 
>> CTRL_FLAG='%s'", sCtrlNo, sProductId, sOperType, sOperType, sProductId, 
>> sCtrlFlag);
>>  
>> The sql to create T_CTRL_CLRRULE is as following:
>> CREATE TABLE T_CTRL_CLRRULE (CTRL_NO TEXT,CTRL_NAME TEXT,CTRL_SYS 
>> TEXT,PRODUCT_ID TEXT,OPERATE_TYPE TEXT,CTRL_FLAG TEXT);
>> CREATE UNIQUE INDEX UIDX_T_CTRL_CLRRULE on T_CTRL_CLRRULE 
>> (CTRL_NO,PRODUCT_ID,OPERATE_TYPE,CTRL_FLAG);
>> CREATE INDEX I_T_CTRL_CLRRULE1 on T_CTRL_CLRRULE (CTRL_NO,CTRL_FLAG);
>>  
>> We can see there is a unique key UIDX_T_CTRL_CLRRULE on t_ctrl_clrrule. The 
>> select statement to get count(*) is based on unique index. There are 
>> 2,000,000 data in t_ctrl_clrrule table. I try to change ctrl_no, product_id, 
>> operate_type, ctrl_flag in above SQL statement to test the efficiency. The 
>> result is that it will take 1.24ms to get the count result using above sql. 
>> Then I split above sql to 3 seperate sql and test the efficiency again. The 
>> 3 sql is as following:
>>  
>> 1.   sprintf (sql_str, "select count(*) from T_CTRL_CLRRULE where 
>> CTRL_NO='%s' and PRODUCT_ID='%s' and OPERATE_TYPE='%s' and CTRL_FLAG='%s'", 
>> sCtrlNo, sProductId, sOperType, sCtrlFlag);
>> 2.   sprintf (sql_str, "select count(*) from T_CTRL_CLRRULE where 
>> CTRL_NO='%s' and PRODUCT_ID='AA' and OPERATE_TYPE='%s' and 
>> CTRL_FLAG='%s'", sCtrlNo, sOperType, sCtrlFlag);
>> 3.   sprintf (sql_str, "select count(*) from T_CTRL_CLRRULE where 
>> CTRL_NO='%s' and PRODUCT_ID='%s' and OPERATE_TYPE='AA' and 
>> CTRL_FLAG='%s'", sCtrlNo, sProductId, sCtrlFlag);
>>  
>> I wrote another program which will execute all above 3 sql every time and I 
>> find it will only take 0.27ms on average executing all above 3 sql.
>>  
>> we can see the sum result of above 3 sql is logically equal to the sql 
>> statement with OR. But the efficiency is quite different. I want to know is 
>> it a performance issue. Or should I split the where statement every time 
>> manually if I met above requirement?
>
>
>
>--
>
>Message: 6
>Date: Sat, 21 Jun 2014 16:12:13 +0200
>From: Clemens Ladisch <clem...@ladisch.de>
>To: sqlite-users@sqlite.org
>Subject: Re: [sqlite] Performance issue using SQLite
>Message-ID: <53a592bd.5020...@ladisch.de>
>Content-Type: text/plain; charset=UTF-8
>
>163 wrote:
>> I met a performance issue when using SQLite 3.8.4.3. I found it would
>> be quite slow trying to select count(*) using a where statement with
>> several OR condition. For example:
>>
>> select ctrl_no from T_CTRL_CLRRULE where CTRL_NO='%s' and ((PRODUCT_ID='%s' 
>> and OPERATE_TYPE='%s') or (PRODUCT_ID='AA' and OPERATE_TYPE='%s') or 
>> (PRODUCT_ID='%s' and OPERATE_TYPE='AA')) and CTRL_FLAG='%s'
>
>This query does not select count(*).
>
>Anyway, its EXPLAIN QUERY PLAN output is:
>
>  0|0|0|SEARCH TABLE T_CTRL_CLRRULE USING INDEX I_T_CTRL_CLRRULE1 (CTRL_NO=? 
> AND CTRL_FLAG=?)
>
>> it will

Re: [sqlite] Performance issue using SQLite

2014-06-21 Thread Clemens Ladisch
163 wrote:
> I met a performance issue when using SQLite 3.8.4.3. I found it would
> be quite slow trying to select count(*) using a where statement with
> several OR condition. For example:
>
> select ctrl_no from T_CTRL_CLRRULE where CTRL_NO='%s' and ((PRODUCT_ID='%s' 
> and OPERATE_TYPE='%s') or (PRODUCT_ID='AA' and OPERATE_TYPE='%s') or 
> (PRODUCT_ID='%s' and OPERATE_TYPE='AA')) and CTRL_FLAG='%s'

This query does not select count(*).

Anyway, its EXPLAIN QUERY PLAN output is:

  0|0|0|SEARCH TABLE T_CTRL_CLRRULE USING INDEX I_T_CTRL_CLRRULE1 (CTRL_NO=? 
AND CTRL_FLAG=?)

> it will take 1.24ms to get the count result using above sql.
> Then I split above sql to 3 seperate sql and test the efficiency again.
>
> 1.  select count(*) from T_CTRL_CLRRULE where CTRL_NO='%s' and 
> PRODUCT_ID='%s' and OPERATE_TYPE='%s' and CTRL_FLAG='%s'
> 2.  select count(*) from T_CTRL_CLRRULE where CTRL_NO='%s' and 
> PRODUCT_ID='AA' and OPERATE_TYPE='%s' and CTRL_FLAG='%s'
> 3.  select count(*) from T_CTRL_CLRRULE where CTRL_NO='%s' and 
> PRODUCT_ID='%s' and OPERATE_TYPE='AA' and CTRL_FLAG='%s'

The EXPLAIN QUERY PLAN output is the same for all three:

  0|0|0|SEARCH TABLE T_CTRL_CLRRULE USING COVERING INDEX UIDX_T_CTRL_CLRRULE 
(CTRL_NO=? AND PRODUCT_ID=? AND OPERATE_TYPE=? AND CTRL_FLAG=?)

> I find it will only take 0.27ms on average executing all above 3 sql.

The combined OR conditions are too complex for the query optimizer, so
it does index lookup only on the other columns.

> should I split the where statement every time manually if I met above 
> requirement?

You could combine the three queries like this:

  select (select count(*) ...) + (select count(*) ...) + (select count(*) ...)

But better try this instead:

  select count(*)
  from T_CTRL_CLRRULE
  where CTRL_NO='%s'
and PRODUCT_ID in ('%s', 'AA')
and OPERATE_TYPE in ('%s', 'AA')
and (PRODUCT_ID != 'AA' or OPERATE_TYPE != 'AA')
and CTRL_FLAG='%s'


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue using SQLite

2014-06-21 Thread 163

> Hi Experts,
> I met a performance issue when using SQLite 3.8.4.3. I found it would be 
> quite slow trying to select count(*) using a where statement with several OR 
> condition. For example:
>  
> sprintf (sql_str, "select ctrl_no from T_CTRL_CLRRULE where CTRL_NO='%s' and 
> ((PRODUCT_ID='%s' and OPERATE_TYPE='%s') or (PRODUCT_ID='AA' and 
> OPERATE_TYPE='%s') or (PRODUCT_ID='%s' and OPERATE_TYPE='AA')) and 
> CTRL_FLAG='%s'", sCtrlNo, sProductId, sOperType, sOperType, sProductId, 
> sCtrlFlag);
>  
> The sql to create T_CTRL_CLRRULE is as following:
> CREATE TABLE T_CTRL_CLRRULE (CTRL_NO TEXT,CTRL_NAME TEXT,CTRL_SYS 
> TEXT,PRODUCT_ID TEXT,OPERATE_TYPE TEXT,CTRL_FLAG TEXT);
> CREATE UNIQUE INDEX UIDX_T_CTRL_CLRRULE on T_CTRL_CLRRULE 
> (CTRL_NO,PRODUCT_ID,OPERATE_TYPE,CTRL_FLAG);
> CREATE INDEX I_T_CTRL_CLRRULE1 on T_CTRL_CLRRULE (CTRL_NO,CTRL_FLAG);
>  
> We can see there is a unique key UIDX_T_CTRL_CLRRULE on t_ctrl_clrrule. The 
> select statement to get count(*) is based on unique index. There are 
> 2,000,000 data in t_ctrl_clrrule table. I try to change ctrl_no, product_id, 
> operate_type, ctrl_flag in above SQL statement to test the efficiency. The 
> result is that it will take 1.24ms to get the count result using above sql. 
> Then I split above sql to 3 seperate sql and test the efficiency again. The 3 
> sql is as following:
>  
> 1.   sprintf (sql_str, "select count(*) from T_CTRL_CLRRULE where 
> CTRL_NO='%s' and PRODUCT_ID='%s' and OPERATE_TYPE='%s' and CTRL_FLAG='%s'", 
> sCtrlNo, sProductId, sOperType, sCtrlFlag);
> 2.   sprintf (sql_str, "select count(*) from T_CTRL_CLRRULE where 
> CTRL_NO='%s' and PRODUCT_ID='AA' and OPERATE_TYPE='%s' and 
> CTRL_FLAG='%s'", sCtrlNo, sOperType, sCtrlFlag);
> 3.   sprintf (sql_str, "select count(*) from T_CTRL_CLRRULE where 
> CTRL_NO='%s' and PRODUCT_ID='%s' and OPERATE_TYPE='AA' and 
> CTRL_FLAG='%s'", sCtrlNo, sProductId, sCtrlFlag);
>  
> I wrote another program which will execute all above 3 sql every time and I 
> find it will only take 0.27ms on average executing all above 3 sql.
>  
> we can see the sum result of above 3 sql is logically equal to the sql 
> statement with OR. But the efficiency is quite different. I want to know is 
> it a performance issue. Or should I split the where statement every time 
> manually if I met above requirement?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue with JOIN and large IN operator

2013-05-01 Thread Richard Hipp
On Wed, May 1, 2013 at 2:11 PM, jic  wrote:

> "Richard Hipp" wrote...
>
> Dr. Hipp,
>
> will this fix break the work-around you provided,
>
> "
> to put a "+" sign in front of the "elements.id" identifier in the ON
> clause:
>
>  SELECT count(*) FROM elements JOIN tags ON +elements.id = tags.element_id
>WHERE elements.id IN ();
> "
> or will it also work?
>

They should both work.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue with JOIN and large IN operator

2013-05-01 Thread jic

"Richard Hipp" wrote...


On Wed, May 1, 2013 at 11:24 AM, Richard Hipp  wrote:




On Wed, May 1, 2013 at 8:30 AM, Martin Altmayer <
martin.altma...@googlemail.com> wrote:


Hi,

I have a query that runs more than 400x slower in 3.7.16.2 than in 
3.7.11.




This seems to be caused by the use of transitive constraints in version
3.7.16.  Your work-around (until an official fix is available in SQLite) 
is

to put a "+" sign in front of the "elements.id" identifier in the ON
clause:

  SELECT count(*) FROM elements JOIN tags ON +elements.id =
tags.element_id

WHERE elements.id IN ();

Thank you for the trouble report.



I think the problem is fixed with 
http://www.sqlite.org/src/info/faedaeace9


Dr. Hipp,

will this fix break the work-around you provided,
"
to put a "+" sign in front of the "elements.id" identifier in the ON clause:

 SELECT count(*) FROM elements JOIN tags ON +elements.id = tags.element_id
   WHERE elements.id IN ();
"
or will it also work?

Thanks. 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue with JOIN and large IN operator

2013-05-01 Thread Richard Hipp
On Wed, May 1, 2013 at 11:24 AM, Richard Hipp  wrote:

>
>
> On Wed, May 1, 2013 at 8:30 AM, Martin Altmayer <
> martin.altma...@googlemail.com> wrote:
>
>> Hi,
>>
>> I have a query that runs more than 400x slower in 3.7.16.2 than in 3.7.11.
>>
>
> This seems to be caused by the use of transitive constraints in version
> 3.7.16.  Your work-around (until an official fix is available in SQLite) is
> to put a "+" sign in front of the "elements.id" identifier in the ON
> clause:
>
>   SELECT count(*) FROM elements JOIN tags ON +elements.id =
> tags.element_id
>
> WHERE elements.id IN ();
>
> Thank you for the trouble report.
>

I think the problem is fixed with http://www.sqlite.org/src/info/faedaeace9



>
>
>
>> Instead of posting the original query, I post a simplified version which
>> still experiences the problem with a factor of over 100x:
>>
>> SELECT COUNT(*) FROM elements JOIN tags ON elements.id = tags.element_id
>> WHERE elements.id IN ();
>>
>> where  is large (e.g. 1,2,3,...,2000). In my application this is not
>> a contiguous list, so I cannot use BETWEEN.
>>
>> To demonstrate the issue it suffices that both tables just contain a
>> single
>> column which is filled with e.g. the integers from 1 to 4000.
>>
>> CREATE TABLE elements (id INTEGER PRIMARY KEY);
>> CREATE TABLE tags (element_id INTEGER);
>>
>> The running time in 3.7.16.2 increases heavily with the length of ,
>> which is not the case in 3.7.11.
>> As far as I know, indices do not improve the situation (my original
>> database has indices).
>> Removing the join solves the problem, but in the original query the join
>> is
>> necessary, because I do not only select COUNT(*).
>>
>> I tested this on several Linux machines.
>>
>> Thanks in advance,
>> Martin
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> D. Richard Hipp
> d...@sqlite.org




-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue with JOIN and large IN operator

2013-05-01 Thread Richard Hipp
On Wed, May 1, 2013 at 8:30 AM, Martin Altmayer <
martin.altma...@googlemail.com> wrote:

> Hi,
>
> I have a query that runs more than 400x slower in 3.7.16.2 than in 3.7.11.
>

This seems to be caused by the use of transitive constraints in version
3.7.16.  Your work-around (until an official fix is available in SQLite) is
to put a "+" sign in front of the "elements.id" identifier in the ON clause:

  SELECT count(*) FROM elements JOIN tags ON +elements.id = tags.element_id
WHERE elements.id IN ();

Thank you for the trouble report.



> Instead of posting the original query, I post a simplified version which
> still experiences the problem with a factor of over 100x:
>
> SELECT COUNT(*) FROM elements JOIN tags ON elements.id = tags.element_id
> WHERE elements.id IN ();
>
> where  is large (e.g. 1,2,3,...,2000). In my application this is not
> a contiguous list, so I cannot use BETWEEN.
>
> To demonstrate the issue it suffices that both tables just contain a single
> column which is filled with e.g. the integers from 1 to 4000.
>
> CREATE TABLE elements (id INTEGER PRIMARY KEY);
> CREATE TABLE tags (element_id INTEGER);
>
> The running time in 3.7.16.2 increases heavily with the length of ,
> which is not the case in 3.7.11.
> As far as I know, indices do not improve the situation (my original
> database has indices).
> Removing the join solves the problem, but in the original query the join is
> necessary, because I do not only select COUNT(*).
>
> I tested this on several Linux machines.
>
> Thanks in advance,
> Martin
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue with JOIN and large IN operator

2013-05-01 Thread Martin Altmayer
Hi,

I have a query that runs more than 400x slower in 3.7.16.2 than in 3.7.11.
Instead of posting the original query, I post a simplified version which
still experiences the problem with a factor of over 100x:

SELECT COUNT(*) FROM elements JOIN tags ON elements.id = tags.element_id
WHERE elements.id IN ();

where  is large (e.g. 1,2,3,...,2000). In my application this is not
a contiguous list, so I cannot use BETWEEN.

To demonstrate the issue it suffices that both tables just contain a single
column which is filled with e.g. the integers from 1 to 4000.

CREATE TABLE elements (id INTEGER PRIMARY KEY);
CREATE TABLE tags (element_id INTEGER);

The running time in 3.7.16.2 increases heavily with the length of ,
which is not the case in 3.7.11.
As far as I know, indices do not improve the situation (my original
database has indices).
Removing the join solves the problem, but in the original query the join is
necessary, because I do not only select COUNT(*).

I tested this on several Linux machines.

Thanks in advance,
Martin
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue on view

2010-06-02 Thread Israel Lins Albuquerque
An Correction! 

/***/ 
DROP TRIGGER IF EXISTS movement_trg02; 
CREATE TRIGGER movement_trg02 AFTER INSERT ON movement FOR EACH ROW 
BEGIN 
REPLACE INTO movement 
SELECT t1.id 
, t1.account_id 
, t1.payment 
, t1.amount 
, (COALESCE((SELECT balance 
FROM movement s1 
WHERE s1.account_id = t1.account_id 
AND (s1.payment < t1.payment 
OR (s1.payment = t1.payment 
AND s1.id < t1.id)) 
GROUP BY s1.account_id 
HAVING s1.payment = MAX(s1.payment) 
AND s1.id = MAX(s1.id)), 0) 
+ t1.amount) AS balance 
FROM movement t1 
WHERE id = new.id; 

UPDATE movement 
SET balance = balance + new.amount 
WHERE account_id = new.account_id 
AND (payment > new.payment 
OR (payment = new.payment 
AND id > new.id)); 
END; 

-- account_id or payment changed 
DROP TRIGGER IF EXISTS movement_trg03; 
CREATE TRIGGER movement_trg03 AFTER UPDATE ON movement FOR EACH ROW WHEN 
(((old.account_id != new.account_id) OR (old.payment != new.payment)) AND 
(new.amount != old.amount)) 
BEGIN 
-- same code as delete trigger 
UPDATE movement 
SET balance = balance - old.amount 
WHERE account_id = old.account_id 
AND (payment > old.payment 
OR (payment = old.payment 
AND id > old.id)); 

-- same code as insert 
REPLACE INTO movement 
SELECT t1.id 
, t1.account_id 
, t1.payment 
, t1.amount 
, (COALESCE((SELECT balance 
FROM movement s1 
WHERE s1.account_id = t1.account_id 
AND (s1.payment < t1.payment 
OR (s1.payment = t1.payment 
AND s1.id < t1.id)) 
GROUP BY s1.account_id 
HAVING s1.payment = MAX(s1.payment) 
AND s1.id = MAX(s1.id)), 0) 
+ t1.amount) AS balance 
FROM movement t1 
WHERE id = new.id; 

UPDATE movement 
SET balance = balance + new.amount 
WHERE account_id = new.account_id 
AND (payment > new.payment 
OR (payment = new.payment 
AND id > new.id));; 
END; 
/***/ 


-- 

Regards/Atenciosamente, 

Israel Lins Albuquerque 
Developer/esenvolvimento 
Polibrás Brasil Software Ltda. 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue on view

2010-06-02 Thread Israel Lins Albuquerque
Look this above! This will help you. I did't maked all test cases! 

/***/ 
CREATE TABLE account ( 
id INTEGER PRIMARY KEY AUTOINCREMENT, 
name TEXT, 
initial_balance REAL 
); 

CREATE TABLE movement ( 
id INTEGER PRIMARY KEY AUTOINCREMENT, 
account_id INTEGER REFERENCES account(id), 
payment UNIXEPOCH, 
amount REAL, 
balance REAL 
); 
CREATE INDEX movent_idx01 ON movement (account_id, payment, id); 

DROP TRIGGER IF EXISTS movement_trg01; 
CREATE TRIGGER movement_trg01 AFTER DELETE ON movement FOR EACH ROW 
BEGIN 
UPDATE movement 
SET balance = balance - old.amount 
WHERE account_id = old.account_id 
AND (payment > old.payment 
OR (payment = old.payment 
AND id > old.id)); 
END; 

DROP TRIGGER IF EXISTS movement_trg02; 
CREATE TRIGGER movement_trg02 AFTER INSERT ON movement FOR EACH ROW 
BEGIN 
REPLACE INTO movement 
SELECT t1.id 
, t1.account_id 
, t1.payment 
, t1.amount 
, (COALESCE((SELECT balance 
FROM movement s1 
WHERE s1.account_id = t1.account_id 
AND (s1.payment < t1.payment 
OR (s1.payment = t1.payment 
AND s1.id < t1.id)) 
GROUP BY s1.account_id 
HAVING s1.payment = MAX(s1.payment) 
AND s1.id = MAX(s1.id)), 0) 
+ t1.amount) AS balance 
FROM movement t1 
WHERE id = new.id; 

UPDATE movement 
SET balance = balance + new.amount 
WHERE account_id = new.account_id 
AND payment >= new.payment 
AND id > new.id; 
END; 

-- account and payment not changed 
DROP TRIGGER IF EXISTS movement_trg03; 
CREATE TRIGGER movement_trg03 AFTER UPDATE ON movement FOR EACH ROW WHEN 
((old.account_id == new.account_id) AND (old.payment == new.payment) AND 
(new.amount != old.amount)) 
BEGIN 
UPDATE movement 
SET balance = balance + (new.amount - old.amount) 
WHERE account_id = old.account_id 
AND (payment > old.payment 
OR (payment = old.payment 
AND id >= old.id)); 
END; 

-- account_id or payment changed 
DROP TRIGGER IF EXISTS movement_trg03; 
CREATE TRIGGER movement_trg03 AFTER UPDATE ON movement FOR EACH ROW WHEN 
(((old.account_id != new.account_id) OR (old.payment != new.payment)) AND 
(new.amount != old.amount)) 
BEGIN 
-- same code as delete trigger 
UPDATE movement 
SET balance = balance - old.amount 
WHERE account_id = old.account_id 
AND (payment > old.payment 
OR (payment = old.payment 
AND id > old.id)); 

-- same code as insert 
REPLACE INTO movement 
SELECT t1.id 
, t1.account_id 
, t1.payment 
, t1.amount 
, (COALESCE((SELECT balance 
FROM movement s1 
WHERE s1.account_id = t1.account_id 
AND (s1.payment < t1.payment 
OR (s1.payment = t1.payment 
AND s1.id < t1.id)) 
GROUP BY s1.account_id 
HAVING s1.payment = MAX(s1.payment) 
AND s1.id = MAX(s1.id)), 0) 
+ t1.amount) AS balance 
FROM movement t1 
WHERE id = new.id; 

UPDATE movement 
SET balance = balance + new.amount 
WHERE account_id = new.account_id 
AND payment >= new.payment 
AND id > new.id; 
END; 

INSERT INTO account (name, initial_balance) VALUES ('account 1', 0); 
INSERT INTO account (name, initial_balance) VALUES ('account 2', 100.0); 
INSERT INTO account (name, initial_balance) VALUES ('account 3', -100.0); 

INSERT INTO movement (account_id, payment, amount) VALUES (1, 1275503470, 
123.45); 
INSERT INTO movement (account_id, payment, amount) VALUES (1, 1275503475, 
-24.10); 
INSERT INTO movement (account_id, payment, amount) VALUES (1, 1275503475, 
50.00); 

UPDATE movement 
SET amount = 150.45 
WHERE payment = 1275503470; 

-- to know what are the current balance of a account 
SELECT a.initial_balance + m.balance 
FROM account a 
JOIN movement m ON (a.id = m.account_id) 
GROUP BY m.account_id 
HAVING payment = MAX(payment); 

/***/ 



- Mensagem original - 
De: "Israel Lins Albuquerque" <israel...@polibrasnet.com.br> 
Para: steph...@mankowski.fr, "General Discussion of SQLite Database" 
<sqlite-users@sqlite.org> 
Enviadas: Terça-feira, 1 de Junho de 2010 17:12:02 
Assunto: Re: [sqlite] Performance issue on view 

Create a new table to do this and add a trigger on op to make the sum. 

- Mensagem original - 
De: "Stéphane MANKOWSKI" <steph...@mankowski.fr> 
Para: sqlite-users@sqlite.org 
Enviadas: Terça-feira, 1 de Junho de 2010 16:57:16 
Assunto: [sqlite] Performance issue on view 

Hi, 

In the this database file (http://skrooge.org/files/test.wrk), I created a 
table 
named "op" containing banking 
transactions. 
A transaction has: 
An unique id 
An account 
A date 
An amount 

I created a view named "v_op" with one more computed attribute named 
"balance". 
This attribute is the sum of all previous transactions (including current one) 
for the same account. 

My problem is that v_op is very slow. This is not usable. 

What can I do to improve performances ? 

PS: I don't want to compute "balance" attribute by code and save it in op 
balance due

Re: [sqlite] Performance issue on view

2010-06-01 Thread Israel Lins Albuquerque
Create a new table to do this and add a trigger on op to make the sum. 

- Mensagem original - 
De: "Stéphane MANKOWSKI" <steph...@mankowski.fr> 
Para: sqlite-users@sqlite.org 
Enviadas: Terça-feira, 1 de Junho de 2010 16:57:16 
Assunto: [sqlite] Performance issue on view 

Hi, 

In the this database file (http://skrooge.org/files/test.wrk), I created a 
table 
named "op" containing banking 
transactions. 
A transaction has: 
An unique id 
An account 
A date 
An amount 

I created a view named "v_op" with one more computed attribute named 
"balance". 
This attribute is the sum of all previous transactions (including current one) 
for the same account. 

My problem is that v_op is very slow. This is not usable. 

What can I do to improve performances ? 

PS: I don't want to compute "balance" attribute by code and save it in op 
balance due to the fact that I am using an undo/redo mechanism. 

Regards, 
Stephane 
___ 
sqlite-users mailing list 
sqlite-users@sqlite.org 
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users 


-- 

Atenciosamente, 

Israel Lins Albuquerque 
Desenvolvimento 
Polibrás Brasil Software Ltda. 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue on view

2010-06-01 Thread Pavel Ivanov
> PS: I don't want to compute "balance" attribute by code and save it in op
> balance due to the fact that I am using an undo/redo mechanism.

>From my experience this is the only way to go - calculate the balance
in your application then store it in database along with transaction
as "balance after this transaction". Yes, you would have to update
many rows during any undo/redo operation but otherwise you will wait
too long to calculate balances on the fly. You can implement some
interim solution which will calculate balance say for the end of each
month. So to calculate current balance you'll have to take record for
the end of previous month and add all transactions for the current
month...


Pavel

2010/6/1 Stéphane MANKOWSKI :
> Hi,
>
> In the this database file (http://skrooge.org/files/test.wrk), I created a 
> table
> named "op" containing banking
> transactions.
> A transaction has:
>        An unique id
>        An account
>        A date
>        An amount
>
> I created a view named "v_op" with one more computed attribute named
> "balance".
> This attribute is the sum of all previous transactions (including current one)
> for the same account.
>
> My problem is that v_op is very slow. This is not usable.
>
> What can I do to improve performances ?
>
> PS: I don't want to compute "balance" attribute by code and save it in op
> balance due to the fact that I am using an undo/redo mechanism.
>
> Regards,
> Stephane
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue on view

2010-06-01 Thread Stéphane MANKOWSKI
Hi,

In the this database file (http://skrooge.org/files/test.wrk), I created a 
table 
named "op" containing banking 
transactions.
A transaction has:
An unique id
An account
A date
An amount

I created a view named "v_op" with one more computed attribute named 
"balance".
This attribute is the sum of all previous transactions (including current one) 
for the same account.

My problem is that v_op is very slow. This is not usable.

What can I do to improve performances ?

PS: I don't want to compute "balance" attribute by code and save it in op 
balance due to the fact that I am using an undo/redo mechanism.

Regards,
Stephane
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue on records retrieval :: 100, 000 records

2009-07-16 Thread Simon Slavin

On 16 Jul 2009, at 6:22pm, MADHAVAN VINOD wrote:

> Suppose, if I don't have index, is this the desired behavior of Sqlite
> to take this much time to fetch just 10 records or am I missing
> something here.

Your first post states quite clearly '5) No INDEX created.'.  Without  
any index on your table, how else could the computer do the SELECT  
command ?

I think you don't know what indexes are for, and you haven't thought  
through the task you are giving the computer.  It may be that at this  
stage you should stop programming and read some books on how database  
systems work, to understand how best to use indexes.  If there are no  
useful indexes on your table, the computer has to read every row of  
the table to find out which rows satisfy your SELECT command.  It then  
has to work out which rows satisfy your 'WHERE' clause, and then sort  
the rows it doesn't reject into the right order.  Naturally this takes  
a lot of time.

The whole idea of indexes is to allow your computer to quickly find  
which rows it needs without having to read every single row.  So  
ideally you define an index which lets it select the right records in  
the right order without having to read any data from the table.  Your  
problem is this: you write in your original post

'So my WHERE clause is like  "CurrTime <= ExpireTime AND CurrTime >=
NextProcessingTime AND a=1 AND b < c"'

I am going to assume that all of 'CurrTime', 'ExpireTime',  
'NextProcessingTime', 'a', 'b', 'c' are fields in your table.

Your next problem is that you use, for example, 'b < c' in your SELECT  
command so you are asking for a test to done for every row in the  
table to decide if the row should be considered for the SELECT  
command: each time you do a SELECT command it has to read every row in  
the table and check to see whether 'CurrTime <= ExpireTime', 'CurrTime  
 >= NextProcessingTime', and 'b < c'.  Since you say you have 100,000  
that's 100,000 read instructions and 300,000 pieces of mathematics it  
has to do every time you do your SELECT command.  The idea of indexes  
is to have as much of possible of that done when the record was saved  
in the first place, so it doesn't have to be done every time you use a  
SELECT command.

If 'CurrTime' is a column of your table you could calculate all of  
your WHERE clause when you write each record and write the result into  
another column of the table, then index that column.  If it isn't,  
then you should certainly provide an index on ExpireTime or  
NextProcessingTime or 'a', whichever one allows the SELECT to reject  
the most records most quickly.  And with a good understanding of the  
data in your table you could make up an compound index that makes the  
SELECT work even faster.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue on records retrieval :: 100, 000 records

2009-07-16 Thread Pavel Ivanov
> Suppose, if I don't have index, is this the desired behavior of Sqlite
> to take this much time to fetch just 10 records or am I missing
> something here.

You're missing that SQLite have to fetch all records satisfying your
condition into memory storage, sort all these records in memory and
then pick 10 first out of them. So in this particular case it could be
expected. And I think index on column b should speed it up.

Pavel

On Thu, Jul 16, 2009 at 1:22 PM, MADHAVAN
VINOD<vinod.madha...@alcatel-lucent.com> wrote:
>
> Hello Michal,
>
> Thanks for the reply.  Please see my comments inline.
>
>>>if you always have condition a=1 (or something similar which uses =
> for
>>>comparison) you should have index which starts with this field.
>
> The possible values for this field are 1/2.  And mostly all the records
> have the value 1.  Hence I do not see any improvement in the query even
> when I created index on this column (a).
>
>
> Moreover, I tried simplifying the WHERE clause to have just "a=1 AND b <
> c AND d=0" and ORDER BY clause to have just "b ASC".
>
> The execution of this query itself took me around 3 seconds.
>
> Suppose, if I don't have index, is this the desired behavior of Sqlite
> to take this much time to fetch just 10 records or am I missing
> something here.
>
>
> Best Regards,
> Vinod N.M.
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Michal Seliga
> Sent: Thursday, July 16, 2009 8:37 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Performance issue on records retrieval :: 100,000
> records
>
>
>
> MADHAVAN VINOD wrote:
>>
>> 5) No INDEX created.
>>
>> The retrieval logic is such that to retrieve the oldest 10 records
> along
>> with some additional constraints (say a, b and c are columns and the
>> constraints are like a=1 AND b < c).
>>
>>
>>
>> So my WHERE clause is like  "CurrTime <= ExpireTime AND CurrTime >=
>> NextProcessingTime AND a=1 AND b < c"
>>
>> And my ORDER BY clause is " CurrTime - NextProcessingTime DESC, b ASC"
>>
>>
>>
>>
>
>
> you need index for this, otherwise lookup goes through whole table
> question is what index would help you the most.
>
> now i am not sure if i understood you correctly, are ExpireTime and
> NextProcessingTime database fields? if yes, then in addition you should
> have at
> the end of index columns one of ExpireTime or NextProcessingTime, you
> should
> choose one which can help you more (one which will help database engine
> to limit
> row count the most)
>
> so for situation you wrote i would recommend to have one of indices:
> create index ... on ...(a,ExpireTime)
> or
>
> create index ... on ...(a,NextProcessingTime )
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue on records retrieval :: 100, 000 records

2009-07-16 Thread MADHAVAN VINOD

Hello Michal,

Thanks for the reply.  Please see my comments inline.

>>if you always have condition a=1 (or something similar which uses =
for
>>comparison) you should have index which starts with this field.

The possible values for this field are 1/2.  And mostly all the records
have the value 1.  Hence I do not see any improvement in the query even
when I created index on this column (a).


Moreover, I tried simplifying the WHERE clause to have just "a=1 AND b <
c AND d=0" and ORDER BY clause to have just "b ASC".

The execution of this query itself took me around 3 seconds.  

Suppose, if I don't have index, is this the desired behavior of Sqlite
to take this much time to fetch just 10 records or am I missing
something here.


Best Regards,
Vinod N.M.

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Michal Seliga
Sent: Thursday, July 16, 2009 8:37 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Performance issue on records retrieval :: 100,000
records



MADHAVAN VINOD wrote:
> 
> 5) No INDEX created.
> 
> The retrieval logic is such that to retrieve the oldest 10 records
along
> with some additional constraints (say a, b and c are columns and the
> constraints are like a=1 AND b < c).  
> 
>  
> 
> So my WHERE clause is like  "CurrTime <= ExpireTime AND CurrTime >=
> NextProcessingTime AND a=1 AND b < c"
> 
> And my ORDER BY clause is " CurrTime - NextProcessingTime DESC, b ASC"
> 
>  
> 
>  


you need index for this, otherwise lookup goes through whole table
question is what index would help you the most.

now i am not sure if i understood you correctly, are ExpireTime and
NextProcessingTime database fields? if yes, then in addition you should
have at
the end of index columns one of ExpireTime or NextProcessingTime, you
should
choose one which can help you more (one which will help database engine
to limit
row count the most)

so for situation you wrote i would recommend to have one of indices:
create index ... on ...(a,ExpireTime)
or

create index ... on ...(a,NextProcessingTime )
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue on records retrieval :: 100, 000 records

2009-07-16 Thread Michal Seliga


MADHAVAN VINOD wrote:
> 
> 5) No INDEX created.
> 
> The retrieval logic is such that to retrieve the oldest 10 records along
> with some additional constraints (say a, b and c are columns and the
> constraints are like a=1 AND b < c).  
> 
>  
> 
> So my WHERE clause is like  "CurrTime <= ExpireTime AND CurrTime >=
> NextProcessingTime AND a=1 AND b < c"
> 
> And my ORDER BY clause is " CurrTime - NextProcessingTime DESC, b ASC"
> 
>  
> 
>  


you need index for this, otherwise lookup goes through whole table
question is what index would help you the most.

if you always have condition a=1 (or something similar which uses = for
comparison) you should have index which starts with this field.

now i am not sure if i understood you correctly, are ExpireTime and
NextProcessingTime database fields? if yes, then in addition you should have at
the end of index columns one of ExpireTime or NextProcessingTime, you should
choose one which can help you more (one which will help database engine to limit
row count the most)

so for situation you wrote i would recommend to have one of indices:
create index ... on ...(a,ExpireTime)
or

create index ... on ...(a,NextProcessingTime )
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue on records retrieval :: 100, 000 records

2009-07-16 Thread MADHAVAN VINOD
Hi All,

 

Description of my setup:

My database contains 

1) One table 

2) 20 fields (contains date field to store the inserted time)

3) 100,000 records

4) database size is 21MB.  

5) No INDEX created.

6) Sqlite version 3.5.9.

 

The retrieval logic is such that to retrieve the oldest 10 records along
with some additional constraints (say a, b and c are columns and the
constraints are like a=1 AND b < c).  

 

So my WHERE clause is like  "CurrTime <= ExpireTime AND CurrTime >=
NextProcessingTime AND a=1 AND b < c"

And my ORDER BY clause is " CurrTime - NextProcessingTime DESC, b ASC"

LIMIT 10.

 

The problem is that it is taking around 6-7 seconds to retrieve 10
records from the table containing 100,000 records.  

 

One more problem is that the UPDATE query is getting failed with error
[Databse is locked] during this retrieval.  Of course, I guess any other
query also would have failed during this period. 

 

So, please let me know, is there any tuning that can help my retrieval
to get better. 

 

I was expecting that Sqlite shall give better performance for at least
10 million records.  If I am correct, please help me out to achieve the
same or if not please let me know how best we can achieve using Sqlite.

 

Thanks in advance for your help.

 

Thanks and Regards,

Vinod N.M.

 

 

 

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in using 'LIKE' with Indexes.

2009-07-08 Thread John Machin
On 8/07/2009 7:11 PM, aalap shah wrote:
> Hi,
> 
> I am a new user to sqlite3, I have a program that searches through a
> database. I have a table with 1 column as varchar and I want to
> perform a search on it.
> I have created an index over that column. And I use a select query
> with "column_name LIKE 'a%' ".
> So my first question is will this query use the index created or not?

If that is the only index on the table, it should be used. If there is 
another index on some other column in the table and that other column is 
mentioned in the query, then SQLite may decide to use that other index 
instead.

You can find out what indexes are being used by using "explain query 
plan select ..." instead of "select ..."; instead of returning results 
it will return a row of info for each table involved in the query.

Note the above answer is conditional on the expression having a trailing 
  '%'. Anything other than a "startswith" condition can make no use of 
the  index.

Reading material: http://www.sqlite.org/optoverview.html

> 
> And if it does then , according to my understanding select query like
> above will directly locate records starting with 'a' and results will
> be returned.

Not "directly" in the sense that a hash index would in a non-LIKE case. 
All SQLite indexes use a BTree structure. It will locate all rows such 
that 'a' <= your_column < 'b', typically by descending the tree to 
locate the smallest value that is >= 'a' then proceding in key sequence 
until it finds a value that doesn't start with 'a'.

> And if I change my query to have "column_name LIKE 'ab%'
> " will take more time then previous because sqlite3 will have to
> perform strcmp of some sort to find results.

No, it will use the same procedure as the LIKE 'a%' query.

> But the results that I have observed , it seems that 2nd query takes
> less time than first one.

Ummm, that could be because logic guarantees that (# rows retrieved by 
LIKE 'a%') <= (#rows retrieved by LIKE 'ab%') ... more character 
comparisons, but their cost is trivial compared to the cost per row 
retrieved.
> 
> And if it doesn't then, how do I make use of index with LIKE queries.

See section 4.0 of the link I quoted above.

HTH,
John
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue in using 'LIKE' with Indexes.

2009-07-08 Thread Dan

On Jul 8, 2009, at 4:11 PM, aalap shah wrote:

> Hi,
>
> I am a new user to sqlite3, I have a program that searches through a
> database. I have a table with 1 column as varchar and I want to
> perform a search on it.
> I have created an index over that column. And I use a select query
> with "column_name LIKE 'a%' ".
> So my first question is will this query use the index created or not?

Possibly. Details are here:

   http://www.sqlite.org/optoverview.html#like_opt


Dan.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance issue in using 'LIKE' with Indexes.

2009-07-08 Thread aalap shah
Hi,

I am a new user to sqlite3, I have a program that searches through a
database. I have a table with 1 column as varchar and I want to
perform a search on it.
I have created an index over that column. And I use a select query
with "column_name LIKE 'a%' ".
So my first question is will this query use the index created or not?

And if it does then , according to my understanding select query like
above will directly locate records starting with 'a' and results will
be returned. And if I change my query to have "column_name LIKE 'ab%'
" will take more time then previous because sqlite3 will have to
perform strcmp of some sort to find results.
But the results that I have observed , it seems that 2nd query takes
less time than first one.

And if it doesn't then, how do I make use of index with LIKE queries.

Thanks a lot in Advance.
Aalap..

-- 
"Real men don't use backups, they post their stuff on a public ftp
server and let the rest of the world make copies."  - Linus Torvalds
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance issue

2007-12-10 Thread Dennis Cote

Stergios Zissakis wrote:


My question is: shouldn't sqlite's engine figure out what I am trying 
to do and sort the tables on the fly in an effort to optimize the query?
When using no indexes, a .explain reveals 3 nested loops which take a 
long time to return results.


Any help/ideas will be much appreciated.




You should read the optimizer docs at 
http://www.sqlite.org/optoverview.html for an explanation of why this 
happens.


HTH
Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Performance issue

2007-12-10 Thread Stergios Zissakis

Hello to everyone,
This is my first post in the list
I've got the following 3 tables:

CREATE TABLE A
(
 int1 INTEGER,
 txt1 TEXT,
 int2 INTEGER,
 txt2 TEXT,
 PRIMARY KEY
 (
   txt1
 )
);

CREATE TABLE B
(
 txt1 TEXT,
 int1 INTEGER
);

CREATE TABLE C
(
 txt1 TEXT,
 int1 INTEGER
);

Each table contains 1000 rows. The following query takes about 7 minutes 
to return results without using any index apart from the table A's 
primary key:
select count(*) from A INNER JOIN Bon (A.txt1 = B.txt1) INNER JOIN C on 
(b.txt1 = C.txt1);

If I index columns A.txt1 kai B.txt1, the time gets reduced to milliseconds.

My question is: shouldn't sqlite's engine figure out what I am trying to 
do and sort the tables on the fly in an effort to optimize the query?
When using no indexes, a .explain reveals 3 nested loops which take a 
long time to return results.


Any help/ideas will be much appreciated.
Thanks for your time.
Kind Regards,

Stergios Zissakis (aka Sterge)

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] performance issue

2007-07-18 Thread RaghavendraK 70574

Turn off sync using pragma.
Build sqlite with appropriate cache and page size suitable to u;r system.
use sqliteanalyze and find out the page size.
Always use begin and commit/rollback for insert and updates.

Joe& others had provided us with good direction.

regards
ragha
**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
 
*

- Original Message -
From: [EMAIL PROTECTED]
Date: Wednesday, July 18, 2007 1:36 pm
Subject: [sqlite] performance issue

> Hi
> 
> I am using SQLite on MVL OS for ARM processor based embedded platform.
> I am using SQLite version 3.3.13. We use SQLite APIs for DB operation.
> I am facing following issue. 
> 
> While testing I observed INSERT and UPDATE command is taking more 
> time 
> than SELECT queries.
> For example one select query is taking 1 to 2 mili sec where as 
> one INSERT 
> or UPDATE takes 40 to 100 mili secs.
> We are seeing very high latency for write queries.
> We tried some performance enhancement flags and PRAGMA settings. 
> 
> Is there any performance enhancement settings in SQLite? Or any 
> known 
> issue?
> 
> Thanks & Regards
> Suresh
> 
> ***  Aricent-Restricted   ***
> "DISCLAIMER: This message is proprietary to Aricent and is 
> intended solely for the use of 
> the individual to whom it is addressed. It may contain privileged 
> or confidential information and should not be 
> circulated or used for any purpose other than for what it is 
> intended. If you have received this message in error, 
> please notify the originator immediately. If you are not the 
> intended recipient, you are notified that you are strictly
> prohibited from using, copying, altering, or disclosing the 
> contents of this message. Aricent accepts no responsibility for 
> loss or damage arising from the use of the information transmitted 
> by this email including damage from virus."
> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] performance issue

2007-07-17 Thread suresh . bhat
Hi

I am using SQLite on MVL OS for ARM processor based embedded platform.
I am using SQLite version 3.3.13. We use SQLite APIs for DB operation.
I am facing following issue. 

While testing I observed INSERT and UPDATE command is taking more time 
than SELECT queries.
For example one select query is taking 1 to 2 mili sec where as one INSERT 
or UPDATE takes 40 to 100 mili secs.
We are seeing very high latency for write queries.
We tried some performance enhancement flags and PRAGMA settings. 

Is there any performance enhancement settings in SQLite? Or any known 
issue?

Thanks & Regards
Suresh

***  Aricent-Restricted   ***
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for 
the use of 
the individual to whom it is addressed. It may contain privileged or 
confidential information and should not be 
circulated or used for any purpose other than for what it is intended. If you 
have received this message in error, 
please notify the originator immediately. If you are not the intended 
recipient, you are notified that you are strictly
prohibited from using, copying, altering, or disclosing the contents of this 
message. Aricent accepts no responsibility for 
loss or damage arising from the use of the information transmitted by this 
email including damage from virus."


RE: [sqlite] Performance Issue with SQLite Inserts

2004-06-24 Thread Alex Wang
But I am really like in my testing result.
I did 15 M insert in maybe 10 transaction and I am really happy with  the
performance.
I did not use any long filed as index.
-Original Message-
From: D. Richard Hipp [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 24, 2004 3:13 PM
Cc: [EMAIL PROTECTED]
Subject: Re: [sqlite] Performance Issue with SQLite Inserts

Soham Mehta wrote:
> Thanks you for all of the ideas.  Here are answers to the various
questions:
> 
> 1) Currently, I am committing after every 10,000 inserts.  Initially, I
was
> running all of the inserts of a set (25,000+ records) in one transaction.
I
> found that I was able to improve performance slightly by committing after
> every 10,000 records.  Either way, the performance is still too poor.
> 
> 2) I am going directly to the dll (not using ODBC).
> 
> 3) I am already using PRAGMA default_synchronous=OFF
> 

*  Are you writing to a local disk or to a network filesystem?

*  Please show use your schema and an example INSERT, complete
with data.


-- 
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  1   2   >