Re: [sqlite] Speeding up a query

2016-11-29 Thread Simon Slavin

On 29 Nov 2016, at 9:41pm, Kevin Baggett  wrote:

> Here's the query:
> SELECT a.observation_day, a.observation_hour, a.observation_time, 
> a.text_file_name, a.start_byte, a.message_length, a.wmo_header, a.wmo_prefix, 
> max(a.rmkcorr_flag),b.wmo_ID,b.latitude,b.longitude from main.file_list a, 
> main.station_list b WHERE a.wmo_ID=b.wmo_ID AND (a.wmo_prefix IN ("SA","SP")) 
> GROUP BY a.wmo_ID, a.observation_time ORDER by a.observation_time;
> 
> I put the following index on file_list:
> create index combo_index on 
> file_list(wmo_prefix,wmo_ID,observation_time,rmkcorr_flag);
> on station_list:
> create index wmo_station_index on station_list (wmo_ID);

Add these …

CREATE INDEX combo_index2 ON 
file_list(wmo_ID,latitude,wmo_prefix,observation_time);
CREATE INDEX wmo_station_index2 ON station_list (wmo_ID, longitude);

Run ANALYZE.  See if anything improves.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Keith Medcalf

The standard computer answer applies:  The fastest way to perform I/O is not to 
do it (and contrary to recent press, this discovery was not made by Microsoft 
Research a couplke of weeks ago but was "discovered" in the 1950's when, on 
average, a good secretary could find a file in the filing cabinets faster than 
the computer could find the same record in several miles of magnetic tape).

You achieve this by using a disk cache -- preferably a block-cache, however 
this technology has gone by the wayside and periodically "discovered" to be how 
to run a good cache every few years.  It dates back to (once again) the 60's 
and 70's.  Modern crap uses filesystem based caching which is ill-conceived and 
usually more-or-less totally brain-dead.

In any case, your cache size should be tuned so that you average a 90% hit rate 
or better (which with a properly designed block cache is not that hard to 
achieve or very large).  Or, in these days of humongous amounts of medium speed 
(dynamic) RAM, the cache should use all space not otherwise being used for code 
and data working set (if you bought memory and it is "free" as in unused, you 
flushed your money down the toilet).

You do not want to use shared cache.  Shared cache is designed for use in 
really itty bitty bitty bitty bitty boxes where memory is measured in bytes.  
(phones, watches, TVs, hand-held TV remote controls, etc).  If you are using 
something that qualifies as a "computer" then you do not want a shared cache.

Of course bear in mind process memory limits (you may not be able to use more 
than 256MB or 512MB total cache per process on a 32-bit computer that only 
allocates 2 GB of virtual address space per process) and also the fact that 
just because you "said" to use 4 TB of RAM as cache does not mean that 4 TB 
will be used.  A 1 GB file will use a maximum of 1 or perhaps 2 GB of cache 
(depending on your operations) even if you tell to use 4 TB of cache.

So, the optimal answer is often "as much as possible without impacting the 
multiprogramming ratio, especially on Operating Systems of brain-dead design 
from the get-go which favour bad uses of memory over good ones).

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Mark Hamburg
> Sent: Tuesday, 29 November, 2016 11:39
> To: SQLite mailing list
> Subject: Re: [sqlite] Read-only access which does not block writers
> 
> One other question about WAL mode and simultaneous readers and writers:
> How are people setting their page caches? My read is that shared cache is
> probably not what's wanted. I was setting my reader caches to be bigger
> than my writer cache under the assumption that writers write and then move
> on whereas readers would benefit from having more data cached, but I'm now
> thinking that the disk cache should be getting me the latter effect and
> increasing the size of the write cache should allow the writer to run
> longer without having to flush from memory to disk. Is there any standard
> advice in this regard or is this the sort of question where the answer is
> "experiment".
> 
> Mark
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Speeding up a query

2016-11-29 Thread Kevin Baggett

Hi,
I am trying to speed up a query on my SQLite database using SQLite 
version 3.7.17 2013-05-20 00:56:22.

The daily database has 2 tables: file_list and station_list.
Yesterday's database had a file_list table of over 1.7 million records. 
station_list is pretty much constant at 21549 records.
For my query, I am looking for records that have a wmo_prefix field of 
"SA" and "SP".

So, that reduces the records from file_list to 363,710.
I have a field "rmkcorr_flag" that has a value of 0 to 3 based on the 
underlying data that the record is referring to.
What I want returned is the record with the maximum of the rmkcorr_flag 
for the associated wmo_ID and observation_time (e.g. 2 for KMSN at 213347Z)


Here's the query:
SELECT a.observation_day, a.observation_hour, a.observation_time, 
a.text_file_name, a.start_byte, a.message_length, a.wmo_header, 
a.wmo_prefix, max(a.rmkcorr_flag),b.wmo_ID,b.latitude,b.longitude from 
main.file_list a, main.station_list b WHERE a.wmo_ID=b.wmo_ID AND 
(a.wmo_prefix IN ("SA","SP")) GROUP BY a.wmo_ID, a.observation_time 
ORDER by a.observation_time;


I put the following index on file_list:
create index combo_index on 
file_list(wmo_prefix,wmo_ID,observation_time,rmkcorr_flag);

on station_list:
create index wmo_station_index on station_list (wmo_ID);

I ran ANALYZE and EXPLAIN QUERY PLAN.

ANALYZE: table sqlite_stat1 shows file_list|combo_index|1708131 18172 43 5 4
station_list|wmo_station_index|21549 2
EXPLAIN QUERY PLAN:
0|0|0|SEARCH TABLE file_list AS a USING INDEX combo_index (wmo_prefix=?) 
(~36344 rows)

0|0|0|EXECUTE LIST SUBQUERY 1
0|1|1|SEARCH TABLE station_list AS b USING INDEX wmo_station_index 
(wmo_ID=?) (~2 rows)

0|0|0|USE TEMP B-TREE FOR GROUP BY
0|0|0|USE TEMP B-TREE FOR ORDER BY

So, when I first run the query in the database with .timer on, it hangs 
and hangs and hangs, and then says:

CPU Time: user 3.920404 sys 1.800726
I didn't measure the hanging time, but it was definitely over a minute, 
if not two. ( I see later versions of SQLite have an elapsed time)

Subsequent runs of the query match up with the times above.
Seems like there is some kind-of loading into cache the first time???

Anyway, I'm not a heavy SQLite/database user, so please be gentle :-)
I just want to see if there is anything that stands out to anyone that I 
can do to speed up my query, especially the first time through.
I can/should update the SQLite version, but I don't think that's the 
problem.


Thanks for any help!
Kevin

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread James K. Lowden
On Thu, 24 Nov 2016 22:59:32 +0100
Florian Weimer  wrote:

> Does LMDB perform lock-free optimistic reads and 
> retroactively verifies that the entire read operation was consistent? 

In LMDB there are readers and reader-writers.  A reader never writes; a
reader-writer may read, but that read counts as a "write" for purposes
of isolation.  

Each actor -- reader or reader-writer -- is isolated from all others.
There are no write-after-read errors or other SQL isolation anomalies
because there is only ever at most 1 transaction in existence capable
of writing.  Any call to begin a second writable transaction blocks
until the first one completes.  

LMDB uses MVCC to provide each reader with a database snapshot.  If a
writer modifies a row that a reader sees, that modification
happens "elsewhere" to a version that the writer sees and the reader
does not.  After the writer commits and no transactions remain
that refer to the superseded version, it ceases to exist.  New readers
thenceforward see the modified version.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Simon Slavin

On 29 Nov 2016, at 6:10pm, Mark Hamburg  wrote:

> In the single WAL scenario, what I probably really want is a way for the 
> checkpoint operation on the write connection to do its work then wait for 
> exclusive access — standard reader/writer lock pattern — to do the WAL reset. 
> This would presumably limit the time that the readers were blocked since I 
> would expect the WAL reset to be reasonably quick if the checkpoint copying 
> work were already complete.

Remember that SQLite has no server or background processes.  All your processes 
are messing with the database file in real time while the SQLite API calls are 
executing.

> Furthermore, the write operation would only be blocked for the length of the 
> longest outstanding read so as long I favor big transactions for writes and 
> short queries for reads, the writer shouldn't be blocked for too long either. 
> Are there checkpoint settings that achieve this or do I need to build that 
> logic into my code?

You can use the SQL command "BEGIN IMMEDIATE".  That’s quite standard and is no 
problem.  You can also use "sqlite3_db_mutex()".  This is non-standard and you 
need to understand exactly what you’re doing.

But given how good SQLite is at this stuff itself (really really good) you’re 
unlikely to be able to improve on SQLite’s default behaviour.  You can spend 
100 hours programming and end up with something no better than if you’d done 
nothing.  So I strongly recommend you leave stuff alone and let SQLite do it’s 
thing.  But if you really want to mess with it, that’s how.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Jens Alfke

> On Nov 29, 2016, at 9:09 AM, Simon Slavin  wrote:
> 
> You cannot design a system which (A) provides up-to-date data to readers (B) 
> allows writers to get rid of their data immediately without ever locking up 
> and (C) guarantees that earlier changes to the data are ’saved' before later 
> changes, thus preserving uncorrupted data in the case of power-cuts, etc..

In the general case of multiple writers I would agree, but I think Mark said 
he’s only got one writer. (Or at least the problem he described is independent 
of the number of writers.) And the problem he’s describing is not one of 
correctness but of performance, i.e. optimizing file size by checkpointing the 
WAL.

> You might like to read more about Brewer’s Theorem:
> 

The CAP theorem applies to distributed systems; I don’t see how it’s relevant 
here?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Scott Hess
On Tue, Nov 29, 2016 at 10:10 AM, Mark Hamburg  wrote:
> On Nov 29, 2016, at 9:09 AM, Simon Slavin  wrote:
>>> On 29 Nov 2016, at 4:18pm, Mark Hamburg  wrote:
>>>
>>> Does this make sense? Does it seem useful? (It seems useful to me when I 
>>> see multi-megabyte WAL files.)
>>
>> Sorry, but I cannot spare the time right now to analyze the system you laid 
>> out.  It usually takes half an hour to diagram out the read and write 
>> procedures and point out where multiple simultaneous ones don’t fit together.
>>
>> I can tell you that entire books are written about the difficulties of 
>> simultaneous access to a database, and that I’ve read too many of them.  And 
>> that it has been proven many times that there’s no solution.  You cannot 
>> design a system which (A) provides up-to-date data to readers (B) allows 
>> writers to get rid of their data immediately without ever locking up and (C) 
>> guarantees that earlier changes to the data are ’saved' before later 
>> changes, thus preserving uncorrupted data in the case of power-cuts, etc..
>>
>> It is possible to implement a version if you drop one of the requirements.  
>> For example, you can have many simultaneous writers as long as you don’t 
>> need any readers.  Or you can have many readers as long as you have only one 
>> writer and you don’t need readers to be completely up-to-date.
>>
>> You might like to read more about Brewer’s Theorem:
>>
>> 
>>
>> In the meantime, I’m glad that WAL mode seems to be useful for you, if you 
>> can cope with big journal files until all connections are closed, it’s a 
>> good solution.
>
> What I probably haven't accounted for is what it would take to do an 
> atomic/safe swap of the WAL files in my double WAL scenario. I need to give 
> that some more thought.

Don't do swaps.  The issue is that WAL needs everyone to agree to
reset to the front of the WAL file.  The _A and _B files could have a
serial number to differentiate which comes first.  Then when the first
file has no readers blocking, that file can be checkpointed to the
main database.  Then that file becomes available for future use.

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Mark Hamburg
On Nov 29, 2016, at 9:09 AM, Simon Slavin  wrote:
> 
> 
>> On 29 Nov 2016, at 4:18pm, Mark Hamburg  wrote:
>> 
>> Does this make sense? Does it seem useful? (It seems useful to me when I see 
>> multi-megabyte WAL files.)
> 
> Sorry, but I cannot spare the time right now to analyze the system you laid 
> out.  It usually takes half an hour to diagram out the read and write 
> procedures and point out where multiple simultaneous ones don’t fit together.
> 
> I can tell you that entire books are written about the difficulties of 
> simultaneous access to a database, and that I’ve read too many of them.  And 
> that it has been proven many times that there’s no solution.  You cannot 
> design a system which (A) provides up-to-date data to readers (B) allows 
> writers to get rid of their data immediately without ever locking up and (C) 
> guarantees that earlier changes to the data are ’saved' before later changes, 
> thus preserving uncorrupted data in the case of power-cuts, etc..
> 
> It is possible to implement a version if you drop one of the requirements.  
> For example, you can have many simultaneous writers as long as you don’t need 
> any readers.  Or you can have many readers as long as you have only one 
> writer and you don’t need readers to be completely up-to-date.
> 
> You might like to read more about Brewer’s Theorem:
> 
> 
> 
> In the meantime, I’m glad that WAL mode seems to be useful for you, if you 
> can cope with big journal files until all connections are closed, it’s a good 
> solution.

What I probably haven't accounted for is what it would take to do an 
atomic/safe swap of the WAL files in my double WAL scenario. I need to give 
that some more thought.

In the single WAL scenario, what I probably really want is a way for the 
checkpoint operation on the write connection to do its work then wait for 
exclusive access — standard reader/writer lock pattern — to do the WAL reset. 
This would presumably limit the time that the readers were blocked since I 
would expect the WAL reset to be reasonably quick if the checkpoint copying 
work were already complete. Furthermore, the write operation would only be 
blocked for the length of the longest outstanding read so as long I favor big 
transactions for writes and short queries for reads, the writer shouldn't be 
blocked for too long either. Are there checkpoint settings that achieve this or 
do I need to build that logic into my code?

Mark
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Simon Slavin

On 29 Nov 2016, at 4:18pm, Mark Hamburg  wrote:

> Does this make sense? Does it seem useful? (It seems useful to me when I see 
> multi-megabyte WAL files.)

Sorry, but I cannot spare the time right now to analyze the system you laid 
out.  It usually takes half an hour to diagram out the read and write 
procedures and point out where multiple simultaneous ones don’t fit together.

I can tell you that entire books are written about the difficulties of 
simultaneous access to a database, and that I’ve read too many of them.  And 
that it has been proven many times that there’s no solution.  You cannot design 
a system which (A) provides up-to-date data to readers (B) allows writers to 
get rid of their data immediately without ever locking up and (C) guarantees 
that earlier changes to the data are ’saved' before later changes, thus 
preserving uncorrupted data in the case of power-cuts, etc..

It is possible to implement a version if you drop one of the requirements.  For 
example, you can have many simultaneous writers as long as you don’t need any 
readers.  Or you can have many readers as long as you have only one writer and 
you don’t need readers to be completely up-to-date.

You might like to read more about Brewer’s Theorem:



In the meantime, I’m glad that WAL mode seems to be useful for you, if you can 
cope with big journal files until all connections are closed, it’s a good 
solution.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Mark Hamburg
One other question about WAL mode and simultaneous readers and writers: How are 
people setting their page caches? My read is that shared cache is probably not 
what's wanted. I was setting my reader caches to be bigger than my writer cache 
under the assumption that writers write and then move on whereas readers would 
benefit from having more data cached, but I'm now thinking that the disk cache 
should be getting me the latter effect and increasing the size of the write 
cache should allow the writer to run longer without having to flush from memory 
to disk. Is there any standard advice in this regard or is this the sort of 
question where the answer is "experiment".

Mark

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Mark Hamburg
Once I figured out how to reliably get the reader and writer connections open 
on a database — key point, when creating the database let the writer get 
through all of the creation work before opening any readers — I've been getting 
great concurrency behavior for simultaneous reads and writes in WAL mode.

What's less great is that if you have enough read activity, the checkpoint 
logic may never be able to actually reset the WAL and the WAL can get very 
large. Basically, if there is a read in process that uses the WAL, it can't get 
reset.

This would obviously be a change to the WAL implementation, but I've been 
thinking that it ought to be possible to use essentially two WALs as 
essentially successive appendages to the main database and to swap their roles 
as the earlier WAL gets written into the main database. In other words 
something like the following:

Logical view: DB ++ WAL_A ++ WAL_B

where ++ is essentially the WAL overlay logic. Writes always go into the second 
WAL. Checkpoints copy from the first WAL into the database. When a checkpoint 
is done, if there are no reads with holds on WAL_A, we can swap the roles of 
the WALs, reset WAL_A which is now the second WAL and hence the target for 
writes and start copying WAL_B which is now the first WAL into the database.

The trick comes in what we do when we start a read. If the first WAL has not 
been fully checkpointed, then we need to grab both WALs. (We could ignore the 
second WAL if it is empty, but that's immaterial to this discussion.) If, 
however, the first WAL has been fully checkpointed, then a read need not grab 
it which then leaves us free to do the WAL swap.

Such a scheme won't help with long reads blocking WAL reset, but it seems like 
it would eliminate issues with having a steady stream of small read operations 
blocking WAL reset.

Does this make sense? Does it seem useful? (It seems useful to me when I see 
multi-megabyte WAL files.)

Mark

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] trimming with tab and other special characters

2016-11-29 Thread Max Vlasov
On Tue, Nov 29, 2016 at 4:19 PM, Igor Tandetnik  wrote:

That's the exact opposite of your interpretation. For backslash escapes,
> you need to rely on "facilities of your programming language". If you
> cannot, and must use raw SQL queries, there are still ways to represent
> arbitrary characters, but backslash escapes is not one of them.
>

Yes, my fault.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] trimming with tab and other special characters

2016-11-29 Thread Igor Tandetnik

On 11/29/2016 4:00 AM, Max Vlasov wrote:

I wonder why OP and other authors of the discussion

https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg49355.html
was so sure about backslash escaping support, even Igor Tandetnik :)


I said, and I quote:

"""
If you do it in your program, just put those characters in the query 
string or
bound parameter string, using facilities of your programming language. 
E.g. in

C that would be something like "trim(vEmail, ' \t\n')".

If you do it manually from, say, command line interface, you can do this:

update mytable set mycolumn=trim(mycolumn, cast(X'20090A' as text));
"""

That's the exact opposite of your interpretation. For backslash escapes, 
you need to rely on "facilities of your programming language". If you 
cannot, and must use raw SQL queries, there are still ways to represent 
arbitrary characters, but backslash escapes is not one of them.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] trimming with tab and other special characters

2016-11-29 Thread R Smith



On 2016/11/29 11:00 AM, Max Vlasov wrote:

Reasonable enough,

I wonder why OP and other authors of the discussion

https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg49355.html
was so sure about backslash escaping support, even Igor Tandetnik :)


None of them were sure about backslash support or claimed it. I think 
you misread.


If you read that more carefully, you will see that Igor was explaining 
how to do it in the "C" code before passing to SQLite API - that's very 
much different to claiming the actual "SQL" interpreter will understand 
backslash escaping.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] trimming with tab and other special characters

2016-11-29 Thread Max Vlasov
Reasonable enough,

I wonder why OP and other authors of the discussion

https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg49355.html
was so sure about backslash escaping support, even Igor Tandetnik :)

On Tue, Nov 29, 2016 at 11:39 AM, Clemens Ladisch 
wrote:

> Max Vlasov wrote:
> > trim(col, char(9))
> >   works, while
> > trim(col,'\t')
> >   does not.
>
>  SELECT trim('ttthello\tt\\\', '\t');
>  hello
>
> Works as designed.
>
> SQL does not use backslash escaping.
> Use char(9) or an actual tab character ('   ').
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] trimming with tab and other special characters

2016-11-29 Thread Clemens Ladisch
Max Vlasov wrote:
> trim(col, char(9))
>   works, while
> trim(col,'\t')
>   does not.

 SELECT trim('ttthello\tt\\\', '\t');
 hello

Works as designed.

SQL does not use backslash escaping.
Use char(9) or an actual tab character ('   ').


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] trimming with tab and other special characters

2016-11-29 Thread Max Vlasov
Hi,

the search in the mailing list about the trim function reveals possible
escaping support for the second parameter of the function, but in my case
(sqlite 3.15.1)

trim(col, char(9))
  works, while
trim(col,'\t')
  does not.

Can someone clarify on that?

Thanks

Max
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users