Re: [sqlite] Things you shouldn't assume when you store names

2019-11-10 Thread Kevin O'Gorman
And "full legal name"   How about my dad, whose full name was Dr. John
Michael Patrick Dennis Emmet O'Gorman, PhD.  How many rules does that
break?  I've fought many companies over that apostrophe in my life.
Governments tend to throw it away, but it's on my old passport and birth
certificate.

---
Dictionary.com's word of the year: *misinformation*
Merriam-Webster word of the year: *justice*


On Sun, Nov 10, 2019 at 4:01 AM Richard Damon 
wrote:

> On 11/10/19 1:21 AM, Gary R. Schmidt wrote:
> > On 10/11/2019 13:44, Doug wrote:
> >> Au Contraire, Jens! In many local contexts you can normalize people's
> >> names. I was born in Kansas, USA. My parents filled out a birth
> >> certificate for me. It had a place on the form for first name, middle
> >> name, last name, and a suffix like II or III.
> >>
> >> That birth certificate form determined that everyone born in Kansas
> >> (at that time), had a first, middle, and last name. There was no
> >> discussion of the matter. That's the way it was. The form led the
> >> way; people never thought about whether it was effective or not. Each
> >> newly-born child was given a first, middle, and last name.
> >>
> >> Effective was irrelevant for that system. There was no option, no
> >> alternative. It simply was.
> >>
> >> All systems are like that at each moment in time. They are what they
> >> are at any moment in time, and they force the users to behave the way
> >> the system wants them to behave. If you want to change the system and
> >> momentum is on your side, then immediately you have a new system - at
> >> that moment in time. It is composed of the old system and the momentum.
> >>
> >> Back to names: just like the birth certificate, a system which
> >> assigns a name to you, actually coerces you to have that name,
> >> because within that system, you exist as that name. The "names"
> >> article is totally wrong when it says that each assumption is wrong.
> >> Each of those assumptions is correct, and I can find at least one
> >> system which makes each one correct. Within each system, the
> >> assumption works, and is valid.
> >>
> >> My two cents...
> > Is not worth the paper it is written on!
> >
> > So what happens when someone from a family who only uses first- and
> > last-names moves to Kansas?
> >
> > Do they have to make up a middle-name so that he idiots can fill out
> > the forms?
> >
> > Well, in the case of the US Navy back in the late 1980's, when a
> > friend of mine from here in Australia, who only has a first and
> > last-name married a USN pilot and moved to the USA, she was told that,
> > "Yes, you have a middle name."  No amount of arguing, or producing of
> > official documents, (well, it's the USA, most people there don't know
> > what a passport is), could prevail.  In the end she conceded defeat
> > and became  Doe , for the duration.
> >
> > Names are impossible, unless you use a free-form, infinite-length
> > field, you won't be safe, and even then, someone with turn up whose
> > name is 'n' recurring to an infinite number of characters or something!
> >
> > Cheers,
> > GaryB-)
> Actually, 'The Artist whose name formerly was Prince' (which wasn't his
> name, his legal name was an unpronounceable pictograph), breaks every
> computer system I know.
>
> --
> Richard Damon
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Back on-line. Was: Mailing list shutting down...

2018-06-14 Thread Kevin O'Gorman
On Thu, Jun 14, 2018 at 8:58 AM, Richard Hipp  wrote:

> ...



>
> So there you have it:  If you want to harass someone by sending them
> thousands of subscription confirmations, there is now a website to
> assist you.  Do we need any further evidence that the heart of man is
> deceitful above all things, and desperately wicked?
>
>
There is no doubt that SOME folks have deceitful hearts and are wicked.
Please don't paint all members of the species with just one brush.

-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multi threaded readers on memory sqlite cannot scale

2018-05-13 Thread Kevin O'Gorman
On Sun, May 13, 2018 at 9:01 AM, Dennis Clarke <dcla...@blastwave.org>
wrote:

> On 05/13/2018 11:57 AM, Kevin O'Gorman wrote:
>
>> The arguments here are simplified
>>
>
>
> Will you stop top posting please?
>
> I am trying to follow along here about some x86 boxen stuff but
> you are top posting madly. Also is that a single socket machine
> with a single big memory bank or is it NUMA and multiple sockets
> or is it just a single motherboard unit?
>
>
> Dennis
>
>
It is a single motherboard with two Xeon sockets, and 16 memory sockets.
I think those sockets are in two banks, which may be relevant.

Anyhow, those details weren't my point.  The point was that computing
hardware
is a galaxy of possibilities, some with more inherent parallelism than
others.  I'm
not personally having throughput troubles.

-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multi threaded readers on memory sqlite cannot scale

2018-05-13 Thread Kevin O'Gorman
The arguments here are simplified, and assume some things that may or may
not be true.  The server I keep in my garage has  16 real cores, 32
threads.  More importantly, it uses DDR4 memory which I think means there
are 4 channels to memory which can be used in parallel -- perhaps not on
exactly the same address but the memory is spread among 16 DIMMs. (It's a
System76 "Silverback" with 256 GB RAM).  Lots of opportunities for
parallelism.  Moreover, depending on the actual work to be done, there may
be a considerable amount of "inherently parallelizable" work.  You don't
know until you try it -- or better yet, measure it.

Sure, there will be limits to how far this can go, but modern machines are
designed to take advantage of opportunities for parallelism.  You just have
to get rid of unnecessary locking.  So I repeat, why is a read-only
database being serialized at all?

On Sun, May 13, 2018 at 8:08 AM, Keith Medcalf  wrote:

>
> >2. Running the same example on sqlite *file *(multi threaded mode;
> >WAL journal) scales almost linearly;  so 6 threads provide nearly 6xN
> >throughput. Single threaded throughput is a bit slower (around 15-
> >20%) than single threaded in-memory access (expected).
>
> So, there is some "part" of the process that is "inherently
> parallelizable" and you have managed to have a one "thread" to some work
> during the time some "other thread" is consumed doing something you cannot
> see.  Congratulations.  This will increase by diminishing returns.
> Eventually adding more parallelism will make things slower.
>
> >So, memory sqlite is not really usable with multiple threads
> >(readers).  While one might expect that multiple readers of
> >*memory *content could scale even better than with file content.
>
> I would expect that a single connection to a single in memory database is
> 100% efficient and cannot be further optimized, and therefore I would not
> try.
>
> Why would I (or anyone of sound mind) want to insert "inefficiencies" so
> that one can then spend inordinate amounts of time to never quite eliminate
> them, and only go forever slower in the process?
>
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multi threaded readers on memory sqlite cannot scale

2018-05-13 Thread Kevin O'Gorman
It's not clear to me why reads must be serialized at all.  Maybe this could
be re-thought?  Maybe there should be a way to tell SQLite that a certain
DB or table is to be read-only and unserialized?

On Sun, May 13, 2018 at 7:15 AM, Keith Medcalf  wrote:

>
> Say Hi to Gene!
>
> https://en.wikipedia.org/wiki/Amdahl%27s_law
>
> So I believe what you are saying is something like this:  If I take a
> child and have it count as fast as it can then it can count to X in an
> hour.  However, I take the same child but have it count as fast as it can
> at five minute stretches, the sum of the X's is less than it was at one
> go.  If I get the child to do this at random intervals consuming juice
> boxes in between, the sum of the X's is even lower, the higher the number
> of interruptions becomes.
>
> In the second case the task consists of counting to ten and then drinking
> a juice box.  If you get one child, then it takes time X.  Interestingly,
> if you get two children, the tasks (empty juice boxes) stack up twice as
> fast.  There is some overlap between the operations.  As you add more and
> more children it goes faster and faster, but not quite.  Eventally all the
> children are drinking the juice box as the same time and adding more
> children does not make things go faster.
>
>
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
> >-Original Message-
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Behalf Of Techno Magos
> >Sent: Sunday, 13 May, 2018 04:51
> >To: sqlite-users@mailinglists.sqlite.org
> >Subject: [sqlite] Multi threaded readers on memory sqlite cannot
> >scale
> >
> >Hello
> >
> >I do not have clear examples to post  on this but would like to
> >report
> >findings around multi threaded read access (single process) in a
> >large
> >system that uses sqlite.
> >
> >This may be a known issue/restriction of memory sqlite behaviour, but
> >wanted to check with the list first:
> >
> >1. Running 2, 3, ... 6 multi threaded readers of a single *memory
> >*sqlite
> >database (via shared cache mode) on an 8 core cpu shows no throughput
> >gain
> >at all compared to single threaded throughput. In fact, it shows a
> >throughput drop: i.e. if a single thread can do N simple queries/sec,
> >2
> >threads .. up to 6 threads do a little less (10% drop) in total. This
> >suggests that access to memory sqlite can only be serialized?
> >
> >2. Running the same example on sqlite *file *(multi threaded mode;
> >WAL
> >journal) scales almost linearly;  so 6 threads provide nearly 6xN
> >throughput. Single threaded throughput is a bit slower (around 15-
> >20%)
> >than single threaded in-memory access (expected).
> >
> >So, memory sqlite is not really usable with multiple threads
> >(readers).
> >While one might expect  that multiple readers of *memory *content
> >could
> >scale even better than with file content.
> >
> >Can this restriction be lifted?
> >Is there some special mode possible to achieve scaling up throughput
> >with
> >multiple threads for memory sqlite content?
> >
> >
> >Thanks
> >___
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Broken DB?

2018-04-27 Thread Kevin O'Gorman
Oops.  Wrong list.  Should go to a Django group.  I noticed as soon as I
sent this.  Please ignore.

On Fri, Apr 27, 2018 at 2:58 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> I've got a working site, but I made a copy of the database in order to do
> some development work.
> I've hit a snag that looks like a problem in the data.
>
> Ive written a management command to show the problem:
>
> from django.core.management.base import BaseCommand, CommandError
>
> # Stuff for the library
> from oil.models import Packet, Signature, Log, Voter
>
> class Command(BaseCommand):
> help = 'Shows a quick count of validations'
> BaseCommand.requires_migrations_checks = True
>
>
> def handle(self, *args, **options):
> voters = Log.objects.all()
> self.stdout.write(repr(voters[0]))
>
> I'm suspecting a problem has crept into my Log table, because it works
> fine if I change Log on the
> second line of handle() to any of the other tables.  If it runs as shown
> here however, I get
>
> kevin@camelot-x:/build/comprosloco$ manage oiltest
> Traceback (most recent call last):
>   File "./manage", line 22, in 
> execute_from_command_line(sys.argv)
>   File "/build/django/django/core/management/__init__.py", line 364, in
> execute_from_command_line
> utility.execute()
>   File "/build/django/django/core/management/__init__.py", line 356, in
> execute
> self.fetch_command(subcommand).run_from_argv(self.argv)
>   File "/build/django/django/core/management/base.py", line 283, in
> run_from_argv
> self.execute(*args, **cmd_options)
>   File "/build/django/django/core/management/base.py", line 330, in
> execute
> output = self.handle(*args, **options)
>   File "/raid3/build/comprosloco/oil/management/commands/oiltest.py",
> line 15, in handle
> self.stdout.write(repr(voters[0]))
>   File "/build/django/django/db/models/base.py", line 590, in __repr__
> u = six.text_type(self)
>   File "/raid3/build/comprosloco/oil/models.py", line 172, in __str__
> self.accepted
> TypeError: sequence item 0: expected str instance, datetime.datetime found
> kevin@camelot-x:/build/comprosloco$
>
> And I have no idea how to debug it further.  The schema of Log is
> sqlite> .schema oil_log
> CREATE TABLE "oil_log" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
> "packet" integer NOT NULL, "signature" integer NOT NULL, "action"
> varchar(20) NOT NULL, "criteria" varchar(150) NOT NULL, "candidates"
> varchar(100) NOT NULL, "accepted" varchar(10) NOT NULL, "user_id" integer
> NOT NULL REFERENCES "auth_user" ("id"), "timestamp" datetime NOT NULL);
> CREATE INDEX "oil_log_packet_ecd59bc4" ON "oil_log" ("packet");
> CREATE INDEX "oil_log_user_id_7f26e501" ON "oil_log" ("user_id");
> sqlite>
>
>
> Help???
>
>
> --
> Dictionary.com's word of the year: *complicit*
>



-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Broken DB?

2018-04-27 Thread Kevin O'Gorman
I've got a working site, but I made a copy of the database in order to do
some development work.
I've hit a snag that looks like a problem in the data.

Ive written a management command to show the problem:

from django.core.management.base import BaseCommand, CommandError

# Stuff for the library
from oil.models import Packet, Signature, Log, Voter

class Command(BaseCommand):
help = 'Shows a quick count of validations'
BaseCommand.requires_migrations_checks = True


def handle(self, *args, **options):
voters = Log.objects.all()
self.stdout.write(repr(voters[0]))

I'm suspecting a problem has crept into my Log table, because it works fine
if I change Log on the
second line of handle() to any of the other tables.  If it runs as shown
here however, I get

kevin@camelot-x:/build/comprosloco$ manage oiltest
Traceback (most recent call last):
  File "./manage", line 22, in 
execute_from_command_line(sys.argv)
  File "/build/django/django/core/management/__init__.py", line 364, in
execute_from_command_line
utility.execute()
  File "/build/django/django/core/management/__init__.py", line 356, in
execute
self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/build/django/django/core/management/base.py", line 283, in
run_from_argv
self.execute(*args, **cmd_options)
  File "/build/django/django/core/management/base.py", line 330, in execute
output = self.handle(*args, **options)
  File "/raid3/build/comprosloco/oil/management/commands/oiltest.py", line
15, in handle
self.stdout.write(repr(voters[0]))
  File "/build/django/django/db/models/base.py", line 590, in __repr__
u = six.text_type(self)
  File "/raid3/build/comprosloco/oil/models.py", line 172, in __str__
self.accepted
TypeError: sequence item 0: expected str instance, datetime.datetime found
kevin@camelot-x:/build/comprosloco$

And I have no idea how to debug it further.  The schema of Log is
sqlite> .schema oil_log
CREATE TABLE "oil_log" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"packet" integer NOT NULL, "signature" integer NOT NULL, "action"
varchar(20) NOT NULL, "criteria" varchar(150) NOT NULL, "candidates"
varchar(100) NOT NULL, "accepted" varchar(10) NOT NULL, "user_id" integer
NOT NULL REFERENCES "auth_user" ("id"), "timestamp" datetime NOT NULL);
CREATE INDEX "oil_log_packet_ecd59bc4" ON "oil_log" ("packet");
CREATE INDEX "oil_log_user_id_7f26e501" ON "oil_log" ("user_id");
sqlite>


Help???


-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple read-only program very slow

2017-12-04 Thread Kevin O'Gorman
On Sun, Dec 3, 2017 at 8:49 AM, Keith Medcalf <kmedc...@dessus.com> wrote:

>
> On Sunday, 3 December, 2017 08:24, Richard Rousselot <
> richard.rousse...@gmail.com> wrote:
>
> >Had similar issue a few years ago; we were using the SQLite3.exe.  We
> >recompiled the exe as 64 bit which allowed it to use more than 4 GB
> >of ram, loaded the machine with as much memory as we could.  Voila,
> >super fast processing.
>
> >Can the Python libraries be made 64 bit some how?
>
> Yes.  You must be using a 64-bit version of Python and the procedure to
> replace the sqlite3.dll / sqlite3.so it uses is the same as for the 32-bit
> version, or to compile and use a 64-bit version of the apsw extension is
> unchanged.
>
> Neither Windows nor Linux can thunk a dynamic load module such that the
> one used is a different model than the running process (it was proprietary
> IBM technology that no one else seems smart enough to duplicate), so you
> have to update Python to the 64-bit model as well.
>
> On Windows 10 16299.98 I have both a 32-bit (Python 2.7.14) and 64-bit
> (Python 3.6.4) installed and build 32-bit DLLs for the former and 64-bit
> for the latter from the same source (just selecting -m32 or -m64 as
> appropriate).  I use the MinGW64/GCC compiler because (a) it can compile in
> either model depending on the switch you use without requiring any code
> changes, (b) supports long long and long double in 32-bit; and, (c) does
> not require the use of the Microsoft C Runtime "moving target" libraries --
> it can compile to the subsystem runtime (MSVCRT) that has been stable
> since, oh, the first OS/2 New Technology (which later became Windows NT)
> way back when.  Oh, and MinGW/GCC does "true" position independent code and
> when you do a static link of a module to either and executable or dynamic
> load library, it is truly static with no strange external dependencies.
>
> Since SQLite3 is heavily I/O bound (or at least syscall/kernel call bound
> for mutexes, etc) in practically everything it does, the 64-bit version is
> much faster (about 20%) than the 32-bit version, when running on a 64-bit
> OS, since the OS does not have to thunk the call stack when
> accessing/returning from  the kernel.
>
> >On Thu, Nov 30, 2017 at 7:01 PM Keith Medcalf <kmedc...@dessus.com>
> >wrote:
> >
> >>
> >> Is there an index on pos where ppos is the left-most field (or the
> >only
> >> field) in the index?
> >> What is the column affinity of ppos?  Of the fiold you are passing
> >as a
> >> parameter?
> >> Is ppos unique?
> >>
> >> If you CREATE [UNIQUE] INDEX searchindex ON pos (ppos, pnum);
> >>
> >> then your query can be satisfied only using the searchindex
> >covering index.
> >>
> >> If there is not an index on ppos, then you will be wasting time
> >recreating
> >> the index for each query.
> >>
> >> You will probably need to increase the cache size beyond the paltry
> >> default in order for the entire btree structures to be cached in
> >RAM -- you
> >> probably want to make it as big as you can.
> >>
> >> ---
> >> The fact that there's a Highway to Hell but only a Stairway to
> >Heaven says
> >> a lot about anticipated traffic volume.
> >>
> >>
> >> >-Original Message-
> >> >From: sqlite-users [mailto:sqlite-users-
> >> >boun...@mailinglists.sqlite.org] On Behalf Of Kevin O'Gorman
> >> >Sent: Saturday, 25 November, 2017 20:14
> >> >To: sqlite-users
> >> >Subject: [sqlite] Simple read-only program very slow
> >> >
> >> >I'm pretty new at SQLite, so this may seem obvious to you.  Be
> >kind.
> >> >I'm using Python on Ubuntu Linux 16.04 LTS, and the sqlite that is
> >> >built
> >> >into Python.  The database
> >> >is using WAL.
> >> >
> >> >I've got a database of some 100 million records, and a file of
> >just
> >> >over
> >> >300 thousand that I want represented in it.  I wanted to check how
> >> >much
> >> >difference it was going to make, so I wrote a super
> >> >simple program to the read the file and count how many records are
> >> >already
> >> >there.  I got impatient waiting for it so I killed the process and
> >> >added an
> >> >output of one dot (".") per 1000 records.  It went very fast for
> >what
> >> >I
> >> >estimate was around 200 dots and hit a wall.  I

Re: [sqlite] Simple read-only program very slow

2017-11-30 Thread Kevin O'Gorman
On Sun, Nov 26, 2017 at 12:02 AM, Clemens Ladisch <clem...@ladisch.de>
wrote:

> Kevin O'Gorman wrote:
> > I wrote a super simple program to the read the file and count how many
> > records are already there.  I got impatient waiting for it so I killed
> > the process and added an output of one dot (".") per 1000 records.  It
> > went very fast for what I estimate was around 200 dots and hit a wall.
> > It made progress, but very very slowly.  [...]
> > The field being used for the lookup has an index.
>
> I'd guess that most records are found, and that the file and the table
> happen to be somewhat sorted.  The search becomes slow when the amount
> of data that needs to be read exceeds the available memory.
>
> > Why does commit() make a difference?
>
> Hmmm ... interesting.
>
> > for row in conn.execute("""
> > SELECT pnum
> > FROM pos
> > WHERE ppos=?
> > """,(pos,)):
> > pnum = row[0]
> > break
> > if pnum is None:
> > missing += 1
> > else:
> > present += 1
>
> Even with the index on ppos, the DB still has to look up the table row
> to read the pnum value.
>
> You do not care about the actual pnum value, so you could replace it
> with a constant value ("SELECT 1 FROM ...").  Or just use EXISTS to
> show what you actually want to do:
>
> cursor = conn.execute("""
> SELECT EXISTS (
> SELECT *
> FROM pos
> WHERE ppos = ?)
> """, (pos,))
> exists = cursor.fetchone()[0]
> if exists:
> present += 1
> else:
> missing += 1
> <http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users>


That's real interesting and I'll keep it in mind for the future, but note
that my actual code DOES care about the contents of pmain, so it has to
look in the database anyway.  I'm still left wondering why it ran so slow
and why a commit() helped a read-only program running alone on the machine.

I'm gonna try this on my Xeon with 256 GB of RAM to check out the idea it
was running out of space.  It is true that the database is 50 GB so of
course it's bigger than the RAM on the usual desktop.


-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Simple read-only program very slow

2017-11-30 Thread Kevin O'Gorman
On Sun, Nov 26, 2017 at 1:39 AM, Simon Slavin <slav...@bigfraud.org> wrote:

>
>
> On 26 Nov 2017, at 3:13am, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
> >
> > I've got a database of some 100 million records, and a file of just over
> > 300 thousand that I want represented in it.  I wanted to check how much
> > difference it was going to make, so I wrote a super
> > simple program to the read the file and count how many records are
> already
> > there.
>
> You can use COUNT(*) to find out how many rows there are in a table.  SQL
> is optimized to handle this faster than reading individual row data.
>
> SELECT COUNT(*) FROM pos
>
> Simon.
>

I know, but that does not help much when I'm trying to match the database
against a file, as I am here.

-- 
Dictionary.com's word of the year: *complicit*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Simple read-only program very slow

2017-11-25 Thread Kevin O'Gorman
I'm pretty new at SQLite, so this may seem obvious to you.  Be kind.
I'm using Python on Ubuntu Linux 16.04 LTS, and the sqlite that is built
into Python.  The database
is using WAL.

I've got a database of some 100 million records, and a file of just over
300 thousand that I want represented in it.  I wanted to check how much
difference it was going to make, so I wrote a super
simple program to the read the file and count how many records are already
there.  I got impatient waiting for it so I killed the process and added an
output of one dot (".") per 1000 records.  It went very fast for what I
estimate was around 200 dots and hit a wall.  It made progress, but very
very slowly.

So I killed it again and added a commit() call every time it output a dot.
It didn't hit a wall, just some rough road (that is, it slowed down at
about the same spot but not nearly so drastically).

The code makes to changes to the database at all.  Why does commit() make a
difference?  What else should I learn from this?

The field being used for the lookup has an index.

++ kevin

Code follows:
#!/usr/bin/env python3
"""Count the number of records that represent rows in the database 'pos'
table.
The database is not modified.

 Last Modified: Sat Nov 25 18:56:49 PST 2017
"""

import os.path  # https://docs.python.org/3.5/library/os.path.html
import sys  # https://docs.python.org/3.5/library/sys.html
import argparse # https://docs.python.org/3.5/library/argparse.html
import sqlite3  # https://docs.python.org/3.5/library/sqlite3.html
import re   # https://docs.python.org/3.5/library/re.html

# from /usr/local/lib/python3.5/dist-packages
import qcreate
from qerror import *
import myparser

if __name__ == '__main__':
parser = argparse.ArgumentParser(description="""A program to read
positions and count how many are
in the database""",)
parser.add_argument("--dbname", default=None,
help="name of the database to work on (overrides qubic.ini
file)")
parser.add_argument("file", nargs='?', type=argparse.FileType('r'),
default=sys.stdin,
help="file containing the qsearch results (default stdin)")
args=parser.parse_args()
infile = args.file

if args.dbname is None:
here=os.path.split(os.path.realpath('.'))[1]
for confdir in
".",os.path.join(os.environ["HOME"],".config"),"/etc/xdg/":
f = os.path.join(confdir, "qubic.ini")
if os.path.exists(f):
args.dbname = myparser.parse(f, here, "dbname")
if args.dbname is not None:
break
if args.dbname is None:
print(" *** ERROR: no database name provided and none found in
qubic.ini files")
sys.exit(1)

present = missing = lines = 0
with sqlite3.connect(args.dbname) as conn:
for line in infile:
fields = line.split()
pos = fields[0]
if len(pos) != 64: # Important test to catch grep without
--no-filename
raise InputError(" ERROR: input line has wrong-sized
position: " + line)

pnum = None
for row in conn.execute("""
SELECT pnum
FROM pos
WHERE ppos=?
""",(pos,)):
pnum = row[0]
break
if pnum is None:
missing += 1
else:
present += 1
lines += 1
if lines % 1000 == 0:
print(".",flush=True,end="")
conn.commit()
print("there were",present,"records on file and",missing," were
missing")
print("out of a total of", lines, "records.")



-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seg fault with core dump. How to explore?

2017-10-01 Thread Kevin O'Gorman
On Sat, Sep 30, 2017 at 11:41 PM, Clemens Ladisch <clem...@ladisch.de>
wrote:

> Kevin O'Gorman wrote:
> > my latest trial run ended with a segmentation fault
>
> Really a segmentation fault?  What is the error message?
>

What such things always say "segementation fault (core dumped)" and the
name of the program.

>
> > This particular program is merging two databases.  The result has reached
> > 25 GB, roughly 1/3 of what I expect of the final result (over 100M rows).
> > The filesystem is a RAID with 2+ TB free.
>
> Does the /var/tmp filesystem fill up?
>

No.  And /var/tmp is not used as I've redirected tmp onto my RAID

>
> > Here's my prime suspect: I'm using WAL, and the journal is 543 MB.
>
> In WAL mode, the log stores the new versions of all changed pages.
> In rollback journal mode, the journal stores the old version of all
> changed pages.  So when you're creating a new DB (where the old version
> is empty), journal rollback mode is likely to be more efficient.
>
> I'm not creating a new database.  I'm merging one into the other.


>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seg fault with core dump. How to explore?

2017-09-30 Thread Kevin O'Gorman
What I'm testing is my code.  I want to be sure the code is going to work.
A crash is a primary indication that it won't.  That's information, not
just an annoyance.

On Sat, Sep 30, 2017 at 5:37 PM, Joseph L. Casale <jcas...@activenetwerx.com
> wrote:

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On
> Behalf Of Kevin O'Gorman
> Sent: Saturday, September 30, 2017 3:55 PM
> To: sqlite-users <sqlite-users@mailinglists.sqlite.org>
> Subject: [sqlite] Seg fault with core dump. How to explore?
>
> > Here's my prime suspect: I'm using WAL, and the journal is 543 MB.
>
> Do you really need any reliability at all for a test? Who cares if the
> power goes out or the program crashes? If this is a test, you will simply
> restart it and the data is irrelevant so why impede any potential
> performance for data integrity?
>
> Try setting the journal_mode off...
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Seg fault with core dump. How to explore?

2017-09-30 Thread Kevin O'Gorman
I'm using the standard shim, because I've been using it forever and first
head of APSW just a few days ago.  I'm guessing it should be pretty easy to
switch because I'm not doing anything weird.  All my columns are INTEGER or
CHAR, there are not even any foreign keys, although one of the two main
tables does contain primary keys (integer autoincrement primary key) of the
other.

I'm a little leery of switching on account of one crash, as it may weel be
an over-reaction.

On Sat, Sep 30, 2017 at 4:30 PM, Simon Slavin <slav...@bigfraud.org> wrote:

>
>
> On 30 Sep 2017, at 10:54pm, Kevin O'Gorman <kevinogorm...@gmail.com>
> wrote:
>
> > Here's my prime suspect: I'm using WAL, and the journal is 543 MB.  I
> > hadn't given it much thought, but could this be more than the software
> > really wants to deal with?
>
> No SQLite.  Possibly something else you’re using.  I used to work daily
> with a 43 Gigabyte SQLite database.  And most of that space was used by one
> tall thin table.  SQLite has known limits and is not thoroughly tested near
> those limits (because nobody can afford to buy enough hardware to do it) ,
> but those limits are a lot more than half a Gigabyte.
>
> <https://sqlite.org/limits.html>
>
> A crash sometimes happens because the programmer continues to call sqlite_
> routines after one of them has already reported a problem.  Are you
> checking the values returned by all sqlite_() calls to see that it is
> SQLITE_OK ?  You may have to learn how your Python shim works to know: it
> may interpret other results as "catch" triggers or some equivalent.
>
> Are you using the standard Python shim or APSW ?  The standard Python shim
> does complicated magic to make SQLite behave the way Python wants it to
> behave.  This complication can make it difficult to track down faults.  You
> might instead want to try APSW:
>
> <https://rogerbinns.github.io/apsw/>
>
> This is an extremely thin shim that does almost nothing itself.  That
> makes it easy to track down all errors to a Python problem or a SQLite
> problem.  I’m not saying we can’t help with the standard Python import,
> just that it’s a little more complicated.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Seg fault with core dump. How to explore?

2017-09-30 Thread Kevin O'Gorman
I'm testing new code, and my latest trial run ended with a segmentation
fault after about 5 hours.
I'm running Python 3.5 and its standard sqlite3 module On Xubuntu 16.04.3
LTS.  The code is short -- about 300 lines.

This particular program is merging two databases.  The result has reached
25 GB, roughly 1/3 of what I expect of the final result (over 100M rows).
The filesystem is a RAID with 2+ TB free.  The machine is a Core i7 with 32
GB RAM and 0 swap has been used since the last reboot.  Nothing else much
is running on this machine except some idle terminal and browser windows.

Here's my prime suspect: I'm using WAL, and the journal is 543 MB.  I
hadn't given it much thought, but could this be more than the software
really wants to deal with?  I'm going to try doing occasional commits
(every 100K inserts/updates perhaps,) but I'd like some help:
1. If I'm on the right track, tell me so I can stop worrying and proceed
with development.
2. If I'm on the wrong track, help me figure out how to debug the problem.
I can probably find out what particular part of the merge it had reached,
but it's going to take quite a while.  I'm pretty good with GDB  but I have
no idea how to explore a running Python program.

The project is a hobby, so there's nothing proprietary, and I can post any
information that would help.

++ kevin

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Linux top command and sqlite

2017-02-21 Thread Kevin O'Gorman
I'll try the synchronous=off the next time I run it.  As it happens, this
one
finished in 57 minutes, which is not bad, considering.

But, I wonder, could i not tell if it's doing lots of commits by looking at
the size of the ?.db-journal file?  It gets pretty large.


On Tue, Feb 21, 2017 at 10:38 AM, Richard Hipp <d...@sqlite.org> wrote:

> On 2/21/17, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
> > I'm not at all sure this is the right place to ask, but as it only comes
> up
> > when I'm running one of
> > my sqlite jobs, I thought I'd start here.  I'm running Python 3.5 scripts
> > in Linux 16.04.1 using the sqlite3 package.  Machine is Core i5, 32GB
> RAM.
> >
> > Some of my stuff takes a while to run, and I like to keep tabs on it.
> > Right now, I'm running one of those, and the Linux top command shows
> > extremely small CPU usage, and a status ("S" column) of "D" which the man
> > page defines as "uninterruptable sleep".  Huh?
>
> My guess:  It is busying doing an fsync() after a transaction commit.
> To find out, temporarily set "PRAGMA synchronous=off" in your script
> and see if that makes the pauses go away.
>
> A better long-term solution would be:
>
> (1) Group multiple changes into a single transaction using BEGIN...COMMIT.
> (2) Set PRAGMA journal_mode=WAL with PRAGMA synchronous=NORMAL.
>
>
> >
> > My code does not use an intentional sleep at all.  It's traversing parts
> of
> > a graph, and not interacting with anything else.  As far as I know,
> anyway,
> > and I wrote the whole thing.
> >
> > Now it's true that it's updating the consequences of changes to graph
> nodes
> > that took about 3 hours just to install, so taking a while is not a
> > surprise by itself.  I just wonder what that status means and how it
> could
> > arise.
> >
> > --
> > word of the year: *kakistocracy*
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Linux top command and sqlite

2017-02-21 Thread Kevin O'Gorman
I'm not at all sure this is the right place to ask, but as it only comes up
when I'm running one of
my sqlite jobs, I thought I'd start here.  I'm running Python 3.5 scripts
in Linux 16.04.1 using the sqlite3 package.  Machine is Core i5, 32GB RAM.

Some of my stuff takes a while to run, and I like to keep tabs on it.
Right now, I'm running one of those, and the Linux top command shows
extremely small CPU usage, and a status ("S" column) of "D" which the man
page defines as "uninterruptable sleep".  Huh?

My code does not use an intentional sleep at all.  It's traversing parts of
a graph, and not interacting with anything else.  As far as I know, anyway,
and I wrote the whole thing.

Now it's true that it's updating the consequences of changes to graph nodes
that took about 3 hours just to install, so taking a while is not a
surprise by itself.  I just wonder what that status means and how it could
arise.

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] "DISTINCT" makes a query take 37 times as long

2017-02-02 Thread Kevin O'Gorman
ou additionally have more CPUs (1 running
> the query, 1 or more sorting the results) working in paralell.
>
> Try EXPLAIN QUERY PLAN to see what the query planner is doing.
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> Im Auftrag von Kevin O'Gorman
> Gesendet: Donnerstag, 02. Februar 2017 03:28
> An: sqlite-users <sqlite-users@mailinglists.sqlite.org>
> Betreff: [sqlite] "DISTINCT" makes a query take 37 times as long
>
> I have a database of positions and moves in a strategic game, and I'm
> searching for unsolved positions that have been connected to an immediate
> ancestor.  I'm using Python 3.5.2, and the code looks like
>
> #!/usr/bin/env python3
> """Output positions that are reachable but unsolved at census 18 or
> greater See page 76 of Qubic log
>
> Last Modified: Tue Jan 31 12:13:07 PST 2017 """
>
> import sqlite3  # https://docs.python.org/3.5/library/sqlite3.html
>
> with sqlite3.connect("917.db") as conn:
> for row in conn.execute("""
> SELECT DISTINCT ppos
> FROM move JOIN pos ON mto = pnum
> WHERE pcensus = 18 and pmin < pmax
> """):
> print(row[0])
>
> As written here, this query runs for 1193 minutes (just short of 20
> hours).  If I remove the "DISTINCT" and instead pipe the result into the
> sort program that comes with Linux "sort --unique" the query and sort takes
> only 31 minutes.  The results are the same, and consist of 4.2 million rows.
>
> This seems extreme.
>
> --
> word of the year: *kakistocracy*
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: h...@scigames.at
>
> This communication (including any attachments) is intended for the use of
> the intended recipient(s) only and may contain information that is
> confidential, privileged or legally protected. Any unauthorized use or
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, please immediately notify the sender
> by return e-mail message and delete all copies of the original
> communication. Thank you for your cooperation.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] "DISTINCT" makes a query take 37 times as long

2017-02-01 Thread Kevin O'Gorman
On Wed, Feb 1, 2017 at 6:35 PM, Richard Hipp <d...@sqlite.org> wrote:

> On 2/1/17, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
> > I have a database of positions and moves in a strategic game, and I'm
> > searching for unsolved positions that have been connected to an immediate
> > ancestor.  I'm using Python 3.5.2, and the code looks like
>
> Please provide us with the following additional information:
>
> (1) In python, run the query: "SELECT sqlite_version(),
> sqlite_source_id();"
>
> (2) In a recent sqlite3 command-line shell (the latest release, not
> whatever 5-year-old release happens to be installed on your system)
> bring up your database and run the command:
>
>  .fullschema --indent
>
> And send in the output.
>
> (3) Download the bundle of command-line tools for your OS, then run
> the command "sqlite3_analyzer" on your database, and send in the
> output.
>
> Thanks.
>
>
>
I am unable to comply with items 2 and 3.  I can download the linux x86
versions, which I expected would run on my x86-64 system, but they don't.
Instead, even when I point right at them, they report "No such file or
directory".  I take this to mean that there is some file they do not find,
like a library, and they report the error code in their return status.

However, my "recent" software reports:
 SQLite version 3.11.0 2016-02-15 17:29:24
Enter ".help" for usage hints.
sqlite> .fullschema --indent
Usage: .fullschema
sqlite> .fullschema
CREATE TABLE base64 (
b64char CHAR NOT NULL PRIMARY KEY,
b64val  INTEGER);
CREATE TABLE pos (
pnum INTEGER PRIMARY KEY AUTOINCREMENT,
ppos CHAR(64) NOT NULL,
pcensus INTEGER NOT NULL,
pscore INTEGER,
pstate CHAR DEFAULT "N" NOT NULL,
pmin INTEGER DEFAULT -99 NOT NULL,
pmax INTEGER DEFAULT 99 NOT NULL,
pmain CHAR(64));
CREATE UNIQUE INDEX pipos ON pos (ppos);
CREATE TABLE move (
mfrom INTEGER NOT NULL,
mto   INTEGER NOT NULL,
mtype CHAR NOT NULL,
mcell INTEGER NOT NULL,
mvalue INTEGER,
ma INTEGER DEFAULT -99,
mb INTEGER DEFAULT 99,
PRIMARY KEY (mfrom, mto, mcell));
CREATE UNIQUE INDEX mrev ON move (mto, mfrom, mcell);
CREATE TABLE expanded (
census INTEGER NOT NULL,
number INTEGER NOT NULL,
pos CHAR(64),
PRIMARY KEY (census, number));
ANALYZE sqlite_master;
ANALYZE sqlite_master;
INSERT INTO sqlite_stat1 VALUES('move','mrev','48329866 2 2 1');
INSERT INTO sqlite_stat1 VALUES('move','sqlite_autoindex_move_1','48329866
38 2 1');
INSERT INTO sqlite_stat1 VALUES('pos','pipos','74409802 1');
INSERT INTO sqlite_stat1 VALUES('base64','sqlite_autoindex_base64_1','64
1');
ANALYZE sqlite_master;
sqlite>

The analyzer is not included in my distribution or its repositiories, as
far as I can tell.  This is Xubuntu, which is a flavor of Ubuntu, which is
derived from Debian.

I'm not sure I want to build your entire software suite.  Perhaps you'd
care to download my database, which I freshly tar-ed and gzip-ed to
http://kosmanor.com/917/917.db.tgz
the databse is 21 GB; the tar is 3.1 GB

> >
> > #!/usr/bin/env python3
> > """Output positions that are reachable but unsolved at census 18 or
> greater
> > See page 76 of Qubic log
> >
> > Last Modified: Tue Jan 31 12:13:07 PST 2017
> > """
> >
> > import sqlite3  # https://docs.python.org/3.5/
> library/sqlite3.html
> >
> > with sqlite3.connect("917.db") as conn:
> > for row in conn.execute("""
> > SELECT DISTINCT ppos
> > FROM move JOIN pos ON mto = pnum
> > WHERE pcensus = 18 and pmin < pmax
> > """):
> > print(row[0])
> >
> > As written here, this query runs for 1193 minutes (just short of 20
> > hours).  If I remove the "DISTINCT" and instead pipe the result into the
> > sort program that comes with Linux "sort --unique" the query and sort
> takes
> > only 31 minutes.  The results are the same, and consist of 4.2 million
> rows.
> >
> > This seems extreme.
> >
> > --
> > word of the year: *kakistocracy*
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
>



-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] "DISTINCT" makes a query take 37 times as long

2017-02-01 Thread Kevin O'Gorman
I have a database of positions and moves in a strategic game, and I'm
searching for unsolved positions that have been connected to an immediate
ancestor.  I'm using Python 3.5.2, and the code looks like

#!/usr/bin/env python3
"""Output positions that are reachable but unsolved at census 18 or greater
See page 76 of Qubic log

Last Modified: Tue Jan 31 12:13:07 PST 2017
"""

import sqlite3  # https://docs.python.org/3.5/library/sqlite3.html

with sqlite3.connect("917.db") as conn:
for row in conn.execute("""
SELECT DISTINCT ppos
FROM move JOIN pos ON mto = pnum
WHERE pcensus = 18 and pmin < pmax
"""):
print(row[0])

As written here, this query runs for 1193 minutes (just short of 20
hours).  If I remove the "DISTINCT" and instead pipe the result into the
sort program that comes with Linux "sort --unique" the query and sort takes
only 31 minutes.  The results are the same, and consist of 4.2 million rows.

This seems extreme.

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] possible integrity problem

2017-01-15 Thread Kevin O'Gorman
On Sat, Jan 14, 2017 at 5:04 PM, Simon Slavin <slav...@bigfraud.org> wrote:

>
> On 15 Jan 2017, at 1:01am, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
>
> > Update: the integrity check said "ok" after about 1/2 hour.
> > the record count now takes about 4 seconds -- maybe I remembered wrong
> and
> > it always took this long, but I wasn't stopping it until it had hung for
> > several minutes.
>
> What you describe actually sounds more like a hardware problem.  You had a
> 'sticky' disk, affecting at least some of the sectors in which that
> database is stored, which has now sorted itself out.  But sometime in the
> future it may become sticky again.  If you have some sort of disk checking
> software you might like to try it.
>
> Given your 5 indexes, 30 minutes to check an 11GB file is completely
> reasonable.  Don’t worry about that.
>
> Good luck with it.
>

It turns out you're right about the time for a check.  However, I don't buy
the "sticky disk" idea.  I could copy the files just fine.  I could create
a new file of the same size just fine.  But sqlite3 was stuck.  How does
that happen?  I don't know, and my imagination is stuck.


-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] possible integrity problem

2017-01-14 Thread Kevin O'Gorman
Update: the integrity check said "ok" after about 1/2 hour.
the record count now takes about 4 seconds -- maybe I remembered wrong and
it always took this long, but I wasn't stopping it until it had hung for
several minutes.
Color me baffled.

On Sat, Jan 14, 2017 at 4:49 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> I've got a database that has acted strangely from time to time.  Or
> actually a series of them, since I erase and build from scratch sometimes,
> as I'm just starting this project.
>
> Anyway, the latest is that the DB is about 11 GB.  It's pretty simple,
> just 2 main tables and maybe 5 indexes, no foreign keys, triggers, or much
> of anything else.  Suddenly just about anything I do seems to hang.
>
> In particular SELECT COUNT(*) FROM pos, which used to take under a
> second.  And I haven't make any changes to the DB since then.  This is true
> even if I access a write-protected copy I made some time ago.
>
> That includes PRAGMA integrity_check, which I started about 20 minutes
> ago.  It's the first time I've tried it so I don't know how long it should
> take, but copying the whole database takes under 3 minutes.
>
> I can interrupt the process with control-C, but cannot make progress.
> About the only thing that seems to be working is in sqlite3 I can ask for
> the schema.
>
> All of this is sqlite on Xubuntu Linux in python 3.5.2, and with sqlite3
> command-line.  If I reboot and try again, things still hang.  There are no
> journals hanging around.
>
> My first question: how long should I expect PRAGMA integrity-check to take?
>
> --
> word of the year: *kakistocracy*
>



-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] possible integrity problem

2017-01-14 Thread Kevin O'Gorman
I've got a database that has acted strangely from time to time.  Or
actually a series of them, since I erase and build from scratch sometimes,
as I'm just starting this project.

Anyway, the latest is that the DB is about 11 GB.  It's pretty simple, just
2 main tables and maybe 5 indexes, no foreign keys, triggers, or much of
anything else.  Suddenly just about anything I do seems to hang.

In particular SELECT COUNT(*) FROM pos, which used to take under a second.
And I haven't make any changes to the DB since then.  This is true even if
I access a write-protected copy I made some time ago.

That includes PRAGMA integrity_check, which I started about 20 minutes
ago.  It's the first time I've tried it so I don't know how long it should
take, but copying the whole database takes under 3 minutes.

I can interrupt the process with control-C, but cannot make progress.
About the only thing that seems to be working is in sqlite3 I can ask for
the schema.

All of this is sqlite on Xubuntu Linux in python 3.5.2, and with sqlite3
command-line.  If I reboot and try again, things still hang.  There are no
journals hanging around.

My first question: how long should I expect PRAGMA integrity-check to take?

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INSERT is corrupting a database

2017-01-13 Thread Kevin O'Gorman
On Fri, Jan 13, 2017 at 3:34 AM, Clemens Ladisch <clem...@ladisch.de> wrote:

> Kevin O'Gorman wrote:
> > On Tue, Jan 10, 2017 at 11:29 PM, Clemens Ladisch <clem...@ladisch.de>
> wrote:
> >> Kevin O'Gorman wrote:
> >>> If I go on to the second table, it appears to finish normally, but
> when I
> >>> try to look at the database with sqlite3, a command-line tool for
> >>> interacting with SQLite, it says the database is corrupt.
> >>
> >> What version?
> >
> > It's whatever is in Python 3.5.2.'s builtin sqlite package.
>
> The sqlite3 command-line shell does not ship with Python.
>
> >> It's possible that there is a bug in your code.  Which you have not
> shown.
> >
> > My opinion is that no user bug whatever should cause DB integrity
> problems without
> > raising an exception.
>
> <http://www.sqlite.org/howtocorrupt.html>
> But it's unlikely that you'd manage to do any of this in Python.
>
> Anyway, my own test program works.
>

That's twisted backwards.

My database builder is built with pure Python, using the SQLite package
that comes with it.  Then sqlite3 just refuses to open the result.  That's
just what's in the Xubuntu 16.04 repos, i.e. version 3.11.0-1ubuntu1; I
didn't report that because I don't suspect sqlite3 of being a cause.
Indeed, it was changes to the python code that seems to have stopped
provoking the error -- nothing about squlite3 has changed.  Anyway, I did
not save the problem code, so I can no longer pursue this.

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INSERT is corrupting a database

2017-01-12 Thread Kevin O'Gorman
On Tue, Jan 10, 2017 at 7:52 PM, Simon Slavin <slav...@bigfraud.org> wrote:

>
> On 11 Jan 2017, at 3:28am, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
>
> > I have a modest amount of data that I'm loading into an SQLite database
> for
> > the first time.  For the moment it contains just two tables and a few
> > indices, nothing else.  The first table loads okay, and if I stop the
> > process at that point, all is well and I can look at the database.
> >
> > If I go on to the second table, it appears to finish normally, but when I
> > try to look at the database with sqlite3, a command-line tool for
> > interacting with SQLite, it says the database is corrupt.
>
> Make absolutely sure you’re starting with a new database file each time,
> not continuing to write to an already-corrupt file.
>
> I'm sure.  The program tests for the existence of the main table before
starting, and throws an exception if it's there, then creates that table as
its first action.


> At stages during your Python program, including after you’ve finished
> loading the first table, use the following command to check to see whether
> the database is correct:
>
> It's no longer possible.  In fixing other things, the program has changed,
and it no longer corrupts the database.  Thanks for this next thing,
though



> PRAGMA integrity_check
>

Thanks for that.  I was not aware of this tool.  I'll keep it handy.


> Use the same command in the command-line tool.
>
> Simon.
>

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INSERT is corrupting a database

2017-01-12 Thread Kevin O'Gorman
On Tue, Jan 10, 2017 at 11:29 PM, Clemens Ladisch <clem...@ladisch.de>
wrote:

> Kevin O'Gorman wrote:
> > If I go on to the second table, it appears to finish normally, but when I
> > try to look at the database with sqlite3, a command-line tool for
> > interacting with SQLite, it says the database is corrupt.
>
> What version?
>

It's whatever is in Python 3.5.2.'s builtin sqlite package.


> > If however, I split the program into two programs, one for each table,
> and
> > run them one after another, all is well.  Same code, each with parts of
> it
> > if-else-ed out.
>
> It's possible that there is a bug in your code.  Which you have not shown.
>
>
My opinion is that no user bug whatever should cause DB integrity problems
without
raising an exception.




-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] INSERT is corrupting a database

2017-01-10 Thread Kevin O'Gorman
This is a problem I don't quite know how to report in a way that will be
useful.

I'm using Python 3.5 and its builtin sqlite package.

I have a modest amount of data that I'm loading into an SQLite database for
the first time.  For the moment it contains just two tables and a few
indices, nothing else.  The first table loads okay, and if I stop the
process at that point, all is well and I can look at the database.

If I go on to the second table, it appears to finish normally, but when I
try to look at the database with sqlite3, a command-line tool for
interacting with SQLite, it says the database is corrupt.

If however, I split the program into two programs, one for each table, and
run them one after another, all is well.  Same code, each with parts of it
if-else-ed out.

I don't know what to blame, or what to try.

-- 
word of the year: *kakistocracy*
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-12-09 Thread Kevin O'Gorman
It wasn't memory.  Or at least nothing obvious.  I ran Memtest86+ 5.01 in
SMP mode for 3 full passes (about 10 hours) with no errors.

On reflection, it didn't seem likely to be a kernel freeze anyway.  It
wasn't that the light on the keyboard was unresponsive, or not just that,
but that all the lights were off including the mouse laser.  The sort of
general "on" light in the tower was on, and the power supply fan was going,
but that's all.  I now wish I had looked at the motherboard readouts.
Anyway, that suggested a power distribution or power supply problem to me.
I just don't know how to bifurcate between the power supply and something
on the mobo USB interface.  And it hasn't happened again.

On Wed, Dec 7, 2016 at 5:18 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> Good feedback.  I haven't done a memory check on that machine in a
> while
>
> Next on my list.
>
> On Sun, Dec 4, 2016 at 11:25 AM, Keith Medcalf <kmedc...@dessus.com>
> wrote:
>
>>
>> If pressing the CAPS LOCK or NUM LOCK keys on the keyboard does not
>> toggle the light on the keyboard then you have lost the all interrupt
>> processing since those keypresses have to be processed by the kernel mode
>> keyboard driver toggling the internal state of the keyboard driver, and
>> then the kernel driver sends output to the keyboard to change the status
>> LEDs.  Typically (all Operating Systems) this means you have suffered a
>> complete kernel crash (or halt) and the system is not running.
>>
>> Since the system must be running in order to output indicator status, all
>> indicators will stay "stuck" in their last known position (hold output).
>>
>> Only a power-cycle (or hardware reset -- assuming the RESET is a hardware
>> reset and not just a request to reset which will be ignored) triggering a
>> reboot will restart the system.
>>
>> The most frequent cause is a Parity Check.  Or in these latter days of
>> not having ECC or even Parity checked memory, just an undetected memory
>> fault which will cause random AHTBL.
>>
>> > -Original Message-
>> > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org
>> ]
>> > On Behalf Of Kevin O'Gorman
>> > Sent: Sunday, 4 December, 2016 09:21
>> > To: SQLite mailing list
>> > Subject: Re: [sqlite] I keep getting seg faults building my database
>> using
>> > python sqlite3
>> >
>> > Well, the i7 system failed again, but this time it was quite different.
>> > And peculiar.
>> > The system wasn't doing anything, but it should have been.  So I tried
>> > something. It didn't matter what, because I could not get the mouse or
>> > keyboard to work -- it was like they weren't plugged in.  Really like
>> it,
>> > because the caps lock light wasn't on, nor was the laser light visible
>> in
>> > the mouse.  Even when I changed mouse, keyboard and USB slot.  I
>> couldn't
>> > get in with SSH from elsewhere either.  But the computer's "I'm running"
>> > light was on.
>> > So I'm suspecting a partial power failure.  I don't know enough about
>> > mobos
>> > and USB to diagnose whether the problem was on the mobo or the power
>> > supply.
>> >
>> > Creepty.  I had to do a hard reset  to get thing going again, and it's
>> > been
>> > running fine for a day now.
>> >
>> > On Mon, Nov 21, 2016 at 9:51 AM, Kevin O'Gorman <
>> kevinogorm...@gmail.com>
>> > wrote:
>> >
>> > > On Mon, Nov 21, 2016 at 9:41 AM, Roger Binns <rog...@rogerbinns.com>
>> > > wrote:
>> > >
>> > >> On 19/11/16 08:08, Kevin O'Gorman wrote:
>> > >> > System with problems: Running Xubuntu Linux 16.04.1, Python 3.5.2.
>> > >> [...]
>> > >> > System without this problem: Running Ubuntu Linux 14.04.5, Python
>> > 3.4.3.
>> > >>
>> > >> You are good on Python versions then.  My remaining recommendation is
>> > to
>> > >> make the process that does SQLite be a child process (ie no making
>> its
>> > >> own children).  That will eliminate an entire class of potential
>> > >> problems, although it appears unlikely you are experiencing any of
>> > them.
>> > >>
>> > >> The final option is to run the process under valgrind.  That will
>> > >> definitively show the cause.  Do note however that you may want to
>> > >> change some of the default options since you have nice 

Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-12-07 Thread Kevin O'Gorman
Good feedback.  I haven't done a memory check on that machine in a
while

Next on my list.

On Sun, Dec 4, 2016 at 11:25 AM, Keith Medcalf <kmedc...@dessus.com> wrote:

>
> If pressing the CAPS LOCK or NUM LOCK keys on the keyboard does not toggle
> the light on the keyboard then you have lost the all interrupt processing
> since those keypresses have to be processed by the kernel mode keyboard
> driver toggling the internal state of the keyboard driver, and then the
> kernel driver sends output to the keyboard to change the status LEDs.
> Typically (all Operating Systems) this means you have suffered a complete
> kernel crash (or halt) and the system is not running.
>
> Since the system must be running in order to output indicator status, all
> indicators will stay "stuck" in their last known position (hold output).
>
> Only a power-cycle (or hardware reset -- assuming the RESET is a hardware
> reset and not just a request to reset which will be ignored) triggering a
> reboot will restart the system.
>
> The most frequent cause is a Parity Check.  Or in these latter days of not
> having ECC or even Parity checked memory, just an undetected memory fault
> which will cause random AHTBL.
>
> > -Original Message-
> > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> > On Behalf Of Kevin O'Gorman
> > Sent: Sunday, 4 December, 2016 09:21
> > To: SQLite mailing list
> > Subject: Re: [sqlite] I keep getting seg faults building my database
> using
> > python sqlite3
> >
> > Well, the i7 system failed again, but this time it was quite different.
> > And peculiar.
> > The system wasn't doing anything, but it should have been.  So I tried
> > something. It didn't matter what, because I could not get the mouse or
> > keyboard to work -- it was like they weren't plugged in.  Really like it,
> > because the caps lock light wasn't on, nor was the laser light visible in
> > the mouse.  Even when I changed mouse, keyboard and USB slot.  I couldn't
> > get in with SSH from elsewhere either.  But the computer's "I'm running"
> > light was on.
> > So I'm suspecting a partial power failure.  I don't know enough about
> > mobos
> > and USB to diagnose whether the problem was on the mobo or the power
> > supply.
> >
> > Creepty.  I had to do a hard reset  to get thing going again, and it's
> > been
> > running fine for a day now.
> >
> > On Mon, Nov 21, 2016 at 9:51 AM, Kevin O'Gorman <kevinogorm...@gmail.com
> >
> > wrote:
> >
> > > On Mon, Nov 21, 2016 at 9:41 AM, Roger Binns <rog...@rogerbinns.com>
> > > wrote:
> > >
> > >> On 19/11/16 08:08, Kevin O'Gorman wrote:
> > >> > System with problems: Running Xubuntu Linux 16.04.1, Python 3.5.2.
> > >> [...]
> > >> > System without this problem: Running Ubuntu Linux 14.04.5, Python
> > 3.4.3.
> > >>
> > >> You are good on Python versions then.  My remaining recommendation is
> > to
> > >> make the process that does SQLite be a child process (ie no making its
> > >> own children).  That will eliminate an entire class of potential
> > >> problems, although it appears unlikely you are experiencing any of
> > them.
> > >>
> > >> The final option is to run the process under valgrind.  That will
> > >> definitively show the cause.  Do note however that you may want to
> > >> change some of the default options since you have nice big systems.
> > For
> > >> example I like to set --freelist-vol and related to very big numbers
> > >> (several gigabytes) which ensures that freed memory is not reused for
> a
> > >> long time.  You could also set the valgrind option so that only one
> > >> thread is allowed - it will catch inadvertent threading you may note
> be
> > >> aware of.
> > >>
> > >> Roger
> > >>
> > >
> > > Thanks for that.  I may do the valgrind thing -- it sounds useful.  But
> > > just to add
> > > to my annoyance about this whole things, I've been having both systems
> > > running
> > > for a couple of days now with no problems or interruptions.  Remember,
> > the
> > > i7 system was failing after 2 hours at most.  I did tweak the code a
> > > little, but
> > > the only thing that seems likely to have stopped the problem is that I
> > put
> > > in
> > > code to do a commit after every 10,000 INSERT statements.  The two
> > systems
> > &

Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-12-04 Thread Kevin O'Gorman
Well, the i7 system failed again, but this time it was quite different.
And peculiar.
The system wasn't doing anything, but it should have been.  So I tried
something. It didn't matter what, because I could not get the mouse or
keyboard to work -- it was like they weren't plugged in.  Really like it,
because the caps lock light wasn't on, nor was the laser light visible in
the mouse.  Even when I changed mouse, keyboard and USB slot.  I couldn't
get in with SSH from elsewhere either.  But the computer's "I'm running"
light was on.
So I'm suspecting a partial power failure.  I don't know enough about mobos
and USB to diagnose whether the problem was on the mobo or the power supply.

Creepty.  I had to do a hard reset  to get thing going again, and it's been
running fine for a day now.

On Mon, Nov 21, 2016 at 9:51 AM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> On Mon, Nov 21, 2016 at 9:41 AM, Roger Binns <rog...@rogerbinns.com>
> wrote:
>
>> On 19/11/16 08:08, Kevin O'Gorman wrote:
>> > System with problems: Running Xubuntu Linux 16.04.1, Python 3.5.2.
>> [...]
>> > System without this problem: Running Ubuntu Linux 14.04.5, Python 3.4.3.
>>
>> You are good on Python versions then.  My remaining recommendation is to
>> make the process that does SQLite be a child process (ie no making its
>> own children).  That will eliminate an entire class of potential
>> problems, although it appears unlikely you are experiencing any of them.
>>
>> The final option is to run the process under valgrind.  That will
>> definitively show the cause.  Do note however that you may want to
>> change some of the default options since you have nice big systems.  For
>> example I like to set --freelist-vol and related to very big numbers
>> (several gigabytes) which ensures that freed memory is not reused for a
>> long time.  You could also set the valgrind option so that only one
>> thread is allowed - it will catch inadvertent threading you may note be
>> aware of.
>>
>> Roger
>>
>
> Thanks for that.  I may do the valgrind thing -- it sounds useful.  But
> just to add
> to my annoyance about this whole things, I've been having both systems
> running
> for a couple of days now with no problems or interruptions.  Remember, the
> i7 system was failing after 2 hours at most.  I did tweak the code a
> little, but
> the only thing that seems likely to have stopped the problem is that I put
> in
> code to do a commit after every 10,000 INSERT statements.  The two systems
> are running identical Python code on the same inputs.  I had intended this
> to
> verify that one fails and the other does not.  What I got is something
> different,
> but on balance I like it best when my processes do not fail out.  Maybe
> this
> time the code will finish (at this rate it will be at least a week, maybe
> three.
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-21 Thread Kevin O'Gorman
On Mon, Nov 21, 2016 at 9:41 AM, Roger Binns <rog...@rogerbinns.com> wrote:

> On 19/11/16 08:08, Kevin O'Gorman wrote:
> > System with problems: Running Xubuntu Linux 16.04.1, Python 3.5.2.
> [...]
> > System without this problem: Running Ubuntu Linux 14.04.5, Python 3.4.3.
>
> You are good on Python versions then.  My remaining recommendation is to
> make the process that does SQLite be a child process (ie no making its
> own children).  That will eliminate an entire class of potential
> problems, although it appears unlikely you are experiencing any of them.
>
> The final option is to run the process under valgrind.  That will
> definitively show the cause.  Do note however that you may want to
> change some of the default options since you have nice big systems.  For
> example I like to set --freelist-vol and related to very big numbers
> (several gigabytes) which ensures that freed memory is not reused for a
> long time.  You could also set the valgrind option so that only one
> thread is allowed - it will catch inadvertent threading you may note be
> aware of.
>
> Roger
>

Thanks for that.  I may do the valgrind thing -- it sounds useful.  But
just to add
to my annoyance about this whole things, I've been having both systems
running
for a couple of days now with no problems or interruptions.  Remember, the
i7 system was failing after 2 hours at most.  I did tweak the code a
little, but
the only thing that seems likely to have stopped the problem is that I put
in
code to do a commit after every 10,000 INSERT statements.  The two systems
are running identical Python code on the same inputs.  I had intended this
to
verify that one fails and the other does not.  What I got is something
different,
but on balance I like it best when my processes do not fail out.  Maybe this
time the code will finish (at this rate it will be at least a week, maybe
three.

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-19 Thread Kevin O'Gorman
On Fri, Nov 18, 2016 at 10:18 AM, James K. Lowden <jklow...@schemamania.org>
wrote:

> On Fri, 18 Nov 2016 08:55:11 -0800
> "Kevin O'Gorman" <kevinogorm...@gmail.com> wrote:
>
> > All of the python code is a single thread.  The closest I come
> > is a few times where I use subprocess.Popen to create what amounts to
> > a pipeline, and one place where I start a number of copies of a C
> > program in parallel, but each is a separate process with its own
> > input and output files.  These C programs have been in use for a
> > number of months for earlier stages of this project, and I regard
> > them as quite reliable.  None of them uses threads, and they are
> > mostly very simple filters.
>
> As you know, a process started with Popen cannot corrupt the Python
> process's memory.  If you're not doing anything to defeat the GIL, a
> segfault inside the Python interpreter would be considered a bug.
>
> But is it happening in the interpreter, or in SQLite for that matter?
> ISTM that's what you need to know.  To know that, you're going to need
> to run a debug version of the interpreter under gdb.  When it faults, a
> backtrace will tell you where.  That's not definititive proof; memory
> corruption is often detected far from where it was caused.  But if the
> fault is at a consistent place in SQLite code, for example, you can
> use a hardware watchpoint to discover what's writing to it.
>
> I'm game to try that, but unsure how to get such a thing.  If I have to
build
it, it's gonna take a while to assemble all the pieces. because it's gotta
have
all the parts I use, and be arranged so as not to interfere with normal use
of python 3.  Seems pretty error-prone itself, but as I said, I'm game.


> I don't know what more to suggest.  I would be surprised if you find a
> fault in Python, in the Python standard library, or in SQLite.  I'm
> sure it won't be in anything on the other side of a popen call.  Are
> there non-standard libraries or Python modules in use that you haven't
> mentioned?
>
> The most likely culprit in my mind is RAM.  You're exercising new memory
> pretty hard, running a bunch of processes at it at full tilt.  Any
> defect in the chips or DMA could explain what you're seeing.  An easy
> test, not necessarily cheap, would be to replace the RAM (or, if
> possible, run with some removed).
>
> I have two war stories related to rotten I/O hardware, where the device
> appeared to work for all intents and purposes, but was actually a
> high-speed bit munger. Those were both over 20 years ago.  It will
> be interesting to hear if that turns out to be your issue.
>
> HTH.
>
> --jkl
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-19 Thread Kevin O'Gorman
Ran Memtest86+ 5.01, two complete passes, with no errors.



On Sat, Nov 19, 2016 at 8:19 AM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

>
>
> On Sat, Nov 19, 2016 at 8:11 AM, Simon Slavin <slav...@bigfraud.org>
> wrote:
>
>>
>> On 19 Nov 2016, at 4:08pm, Kevin O'Gorman <kevinogorm...@gmail.com>
>> wrote:
>>
>> > I have two different machines running this stuff.  Only one is having
>> the
>> > seg faults, but they are different enough that this does not convince me
>> > to blame hardware.
>>
>> Could you, anyway, run a memory test on the computer which is failing ?
>> I don't how one does that under Xubuntu but perhaps you do, or perhaps
>> there's one built into the BIOS.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> Sure.  Memtest86+ is an option on boot.  I just have to find a time slot
> because it takes a while.  Later today.
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-19 Thread Kevin O'Gorman
On Sat, Nov 19, 2016 at 8:11 AM, Simon Slavin <slav...@bigfraud.org> wrote:

>
> On 19 Nov 2016, at 4:08pm, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
>
> > I have two different machines running this stuff.  Only one is having the
> > seg faults, but they are different enough that this does not convince me
> > to blame hardware.
>
> Could you, anyway, run a memory test on the computer which is failing ?  I
> don't how one does that under Xubuntu but perhaps you do, or perhaps
> there's one built into the BIOS.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>

Sure.  Memtest86+ is an option on boot.  I just have to find a time slot
because it takes a while.  Later today.

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-19 Thread Kevin O'Gorman
On Fri, Nov 18, 2016 at 3:19 PM, James K. Lowden 
wrote:

> On Fri, 18 Nov 2016 10:56:37 -0800
> Roger Binns  wrote:
>
> > Popen calls fork (it seems like you are doing Unix/Mac, not Windows).
> > fork() duplicates the process including all open file descriptors.
> > One or more of those descriptors belong to open SQLite databases and
> > ancillary files.
>
> Good catch, Roger.  It's a liability, but I slightly disagree with your
> characterization.
>
> > - Running any Python code (destructors can be called which then run in
> > the parent and child)
>
> Yes, if those destructors affect shared resources.  The OP did say the
> processes on the other side of popen were C programs.
>
> > - Not having file descriptors closed so the child process trashes them
> > (close_fds Popen argument is False in python 2.x but True in python
> > 3.x).
>
> The child process can't "trash" the parent's descriptors.  When the
> child exits, the OS will close its descriptors, that's all.  But, yes,
> if the child process is making some assumption about open descriptors
> it receives at startup, that could be lead to problems.  Especially if
> it scribbles on the SQLite database.
>
> > Also python 2.x subprocess module is broken in many ways.
>
> My foray into Unicode in Python convinced me once and for all that
> Python 3 is the only way to go.  But would you care to elaborate on the
> problems with 2.x subprocess?
>
>
> You can all put aside worries about Python 2.  I only started using Python
a couple of months ago, and went with 3.

I have two different machines running this stuff.  Only one is having the
seg faults, but they are different enough that this does not convince me
to blame hardware.

System with problems: Running Xubuntu Linux 16.04.1, Python 3.5.2.
CPU: core i7, MOBO: Asus Z97 Deluxe, 32GB ram.  4 cores, hyperthreaded
to 8.  Home-built.

System without this problem: Running Ubuntu Linux 14.04.5, Python 3.4.3.
This is a System76 Silverback.  CPU: dual Intel Xeon, 256 GB RAM(!)
16 cores, hyperthreaded to 32.

On both: Filesystem is an 11-TB software RAID (Linux LVM) on top of
three 4-TB hard drives.

--
Python modules are all pretty standard (I think):
stat argparse datetime fileinput glob os re sqlite3 subprocess sys time

children on the other side of Popen are either standard Linux utilities like
split/grep/sort or my own C code, which is mostly very simple with one
exception,
and that is a pretty well-tested implementation of minimax search through
a game tree.  But even in that one, the I/O is very simple: one input line
yields one output line on one of two output streams.

I never set close_fds=False, so the Python3 default protects the database.

Questions, though: I don't bother to commit in the midst of anything, so
you'll notice my journal gets pretty big.  That's not a problem for my
filesystem,
but could it be a problem internally to Python?
-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-18 Thread Kevin O'Gorman
On Fri, Nov 18, 2016 at 8:38 AM, Roger Binns <rog...@rogerbinns.com> wrote:

> On 17/11/16 19:14, Kevin O'Gorman wrote:
> > SO: I need help bifurcating this problem.  For instance, how can I tell
> if
> > the fault lies in SQLite, or in python? Or even in the hardware, given
> that
> > the time to failure is so variable?
>
> Are you using threads, threading related settings etc in any way?  The
> python sqlite3 module (aka pysqlite) is not threadsafe.  That doesn't
> stop people "working around" it, which can lead to crashes.
>
> Roger
>
>
> I am not.  All of the python code is a single thread.  The closest I come
is a few times where I use subprocess.Popen to create what amounts to a
pipeline, and one place where I start a number of copies of a C program in
parallel, but each is a separate process with its own input and output
files.  These C programs have been in use for a number of months for
earlier stages of this project, and I regard them as quite reliable.  None
of them uses threads, and they are mostly very simple filters.

The one that runs in parallel cannot be the culprit, however, because the
code has not reached the point where it would come into play.  That is the
step where the results get "cached" (in flat files) and all of the early
results are in those files.  This thing is dying before reaching unknown
territory where new results are needed.  The reason the results exist is
that they were generated by previous versions of the software that did not
use Python.  I am switching because (a) I want a database instead of flat
files for speed reasons and (b) the C code was getting too hard to maintain.

All of this is a hobby project, and I can share any parts of it that you
care to see.
-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-18 Thread Kevin O'Gorman
On Fri, Nov 18, 2016 at 3:11 AM, Simon Slavin  wrote:

> Forgot to say ...
>
> Most of these problems result from attempting to reuse memory you've
> already released.  Even if the error is happening inside a SQLite routine,
> it will be because you passed it a pointer to an SQLite connection which
> had already been _close()'d or a statement which had already been
> _finalize()'d.
>
>
> Well, the entire process runs under a single connection that is never
closed. Although cursors are opened and closed within some of the Python
functions, the close is always at the end of the function.  And I have no
idea how things get finalized under python, or if that's an issue.  Anyway,
most of the time I use patterns like
  for row in c1.execute(...)
or
  with dbfile.connect() as conn:
and as I understand it, these protect me from most errors of that kind.

Moreover, it seems to me that these would all lead to failure at exactly
the same point, rather than at three points up to an hour different from
each other.

I was thinking it more likely to be one of
a) random hardware malfunction
b) some bug in memory handling that was subject to influence by other
activity in the machine)
c) some bug in hanling pointers to the python structures (it would have to
be a python or sqlite3 bug.)




-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] I keep getting seg faults building my database using python sqlite3

2016-11-17 Thread Kevin O'Gorman
I ran this thing 3 times with identical inputs, it is deterministic, but it
failed after 66, 128 and 96 minutes respectively.  Each run started with no
database at all, and gets a single input from which the rest is
calculated.  The calculations are cached (in flat files), so and it never
got to the poiint where it needed to calculate more.  I get core files, but
don't know how to interpret them.

The machine is a new i7 MOBO with 32 GB RAM, 32 GB swap (mostly unused),
running Xubuntu Linux. The last run left me with a 16 GB database and a
600MB journal.

SO: I need help bifurcating this problem.  For instance, how can I tell if
the fault lies in SQLite, or in python? Or even in the hardware, given that
the time to failure is so variable?

Can anyone help me make sense of the core file?

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a best practice for breaking up a large update?

2016-10-16 Thread Kevin O'Gorman
There are other writers, but they create new stuff, while this big update
wants to consolidate info about existing stuff.  There are also pure
readers.

Some of the docs on WAL logging had caveats that put me off.  Not knowing
exactly what I was going to be doing with the database, I could not be sure
if the caveats applied to me or not.  Particularly because I had some
trouble understanding the caveats, since some of the terminology is a bit
new to me.  Not all of it -- I actually taught a database course at the
undergrad level once, SQL-based, but it was quite a while ago.  But most of
my database experience was either following SQL recipes, And a long time
ago I wrote a full database server from scratch (not relational --
heirarchical) in assembler.

On Sat, Oct 15, 2016 at 6:14 PM, Darren Duncan <dar...@darrenduncan.net>
wrote:

> You didn't say if the other tasks need write access to the database or if
> it is just read-only.  If the others only need read-only, let them access a
> copy of the database while you make your changes in another copy, then just
> swap the databases when done. -- Darren Duncan
>
>
> On 2016-10-15 1:21 PM, Kevin O'Gorman wrote:
>
>> I'm new to this, and working in Python's sqlite3.  So be patient, and
>> don't
>> expect me to know too much.  This is also a personal hobby, so there's
>> nobody else for me to ask.
>>
>> I've got a database of a some tens of millions of positions in a board
>> game.  It may be over a billion before I'm done (working in an 11-TB
>> partition at the moment.) I want to process a major slice of these
>> records,
>> in order to update other records.  I might want to break the work up
>> into chunks to allow other access to the database while this is going on.
>>
>> So I have some questions:
>> (1) If I do all of my updates to a temporary table, does the database
>> still
>> get locked?
>>
>> (2) Is there another way to keep it available?  It happens for this
>> activity that consistency is not at risk.
>>
>> (3) If it turns out that I can avoid locking, it there still a performance
>> reason to break the transaction into chunks, or would I be as well off
>> doing it as a single transaction (assuming I have disk space for the
>> journal).
>>
>> (4) If I break it up into chunks, I can think of several ways to do that
>> and keep track of what's been done and what has not.  Is there a best
>> practice for this?
>>
>> (5) Is there a forum specifically for folks doing Python database
>> programming?  It occurs to me that much of what I'm asking about is not
>> specific to SQLite.  But my code, for now at least, is going to be in
>> Python because it's the easiest way I can see, and my own efforts are the
>> real bottleneck in making progress.
>>
>>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Is there a best practice for breaking up a large update?

2016-10-15 Thread Kevin O'Gorman
I'm new to this, and working in Python's sqlite3.  So be patient, and don't
expect me to know too much.  This is also a personal hobby, so there's
nobody else for me to ask.

I've got a database of a some tens of millions of positions in a board
game.  It may be over a billion before I'm done (working in an 11-TB
partition at the moment.) I want to process a major slice of these records,
in order to update other records.  I might want to break the work up
into chunks to allow other access to the database while this is going on.

So I have some questions:
(1) If I do all of my updates to a temporary table, does the database still
get locked?

(2) Is there another way to keep it available?  It happens for this
activity that consistency is not at risk.

(3) If it turns out that I can avoid locking, it there still a performance
reason to break the transaction into chunks, or would I be as well off
doing it as a single transaction (assuming I have disk space for the
journal).

(4) If I break it up into chunks, I can think of several ways to do that
and keep track of what's been done and what has not.  Is there a best
practice for this?

(5) Is there a forum specifically for folks doing Python database
programming?  It occurs to me that much of what I'm asking about is not
specific to SQLite.  But my code, for now at least, is going to be in
Python because it's the easiest way I can see, and my own efforts are the
real bottleneck in making progress.

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Consistency, rollback and such

2016-09-20 Thread Kevin O'Gorman
Thanks.  That seems clear, and I think I understand it, and the conflicts
that I was referring to result in exceptions thrown at commit() time.  That
makes sense.  I don't think I'll be doing any rollbacks -- the logic is
hard enough as it stands.  I'll just wrap it all in IMMEDIATE
transactions.  They're quick, but conflicts are fairly likely, so it's
probably the right solution.

On Tue, Sep 20, 2016 at 3:02 PM, Richard Hipp <d...@sqlite.org> wrote:

> On 9/20/16, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
> > Surely, Mr. Hipp is an authority, but I'm slightly puzzled by this
> answer.
> > And it doesn't answer the part of the question about what happens if I do
> > it wrong, and transactions conflict.  Based on what I now think is true,
> if
> > I don't do something, transactions begin with the first modification.
> They
> > may be prevented from executing UPDATEs simultaneously, but could still
> > result in non-serializable execution of the whole, and inconsistent or
> > erroneous data in the database, because each would be based on the SELECT
> > statements before either had written.  Or do they result in exceptions?
>
> If you do a "BEGIN;" followed by a "SELECT..." then the transaction
> starts before the SELECT is run.  So it is serializable.
>
> But if you do just a "BEGIN", then some other process might jump in
> line ahead of you and make some other changes to the database.  Your
> transaction will not be able to see those changes, due to isolation.
> But they will be in the database file. Then when you got to COMMIT,
> SQLite will see that your transaction is based on an historical and
> out-of-date version of the database and hence will refuse to do the
> commit.  You'll have to ROLLBACK and try again.
>
> When you do "BEGIN IMMEDIATE" that reserves your spot in line and
> ensures that no other transactions will commit in front of you.
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Consistency, rollback and such

2016-09-20 Thread Kevin O'Gorman
If by explicitly, you mean starting my own transactions, then I guess I'll
be doing it as you are.  I do not even want to think about savepoints and
figuring out what has to be re-done.  My transactions, while numerous, will
be pretty simple and mostly quite quick, so there's not much advantage to
redoing over just delaying.  Besides, I'm not sure I know how to detect a
conflict that would has caused or should cause a rollback.

On Tue, Sep 20, 2016 at 12:21 PM, David Raymond <david.raym...@tomtom.com>
wrote:

> Your understanding of the isolation_level parameter there is correct. It
> only effects how the  transactions are created. If you're going
> to be doing a few selects before you update then  doing the
> "begin someSortOf transaction;" is what you want.
>
> After having some trouble getting savepoints to work correctly I've gotten
> into the habit of always doing isolation_level = None, and doing everything
> explicitly, but as long as you know what's going on then you're good.
>
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Kevin O'Gorman
> Sent: Tuesday, September 20, 2016 12:35 PM
> To: sqlite-users
> Subject: [sqlite] Consistency, rollback and such
>
> I'm also wondering if setting
>   conn = sqlite3.connect("mydb", isolation_level=IMMEDIATE)
> does what I need.  Reading the docs, it would appear this does not start a
> transaction until the UPDATE, and I think I want the transactions to start
> before the first SELECT.  Should I instead do
>   c = conn.cursor()
>   c.execuite("BEGIN TRANSACTION IMMEDIATE")
>
> and is IMMEDIATE the right thing, or do I need EXCLUSIVE.
>
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Consistency, rollback and such

2016-09-20 Thread Kevin O'Gorman
Surely, Mr. Hipp is an authority, but I'm slightly puzzled by this answer.
And it doesn't answer the part of the question about what happens if I do
it wrong, and transactions conflict.  Based on what I now think is true, if
I don't do something, transactions begin with the first modification.  They
may be prevented from executing UPDATEs simultaneously, but could still
result in non-serializable execution of the whole, and inconsistent or
erroneous data in the database, because each would be based on the SELECT
statements before either had written.  Or do they result in exceptions?

On Tue, Sep 20, 2016 at 10:09 AM, Richard Hipp <d...@sqlite.org> wrote:

> On 9/20/16, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
> >   c.execuite("BEGIN TRANSACTION IMMEDIATE")
> >
> > and is IMMEDIATE the right thing, or do I need EXCLUSIVE.
>
> IMMEDIATE is the right thing.  That lets other readers continue and
> new readers to start, but blocks all other writers.
>

Please confirm or refute my understanding that this would let transactions
without the EXCLUSIVE to begin, but not EXCLUSIVE ones even if they begin
with reading.


> EXCLUSIVE would block everybody - readers and writers - which is more
> than you need.
>

If my understanding above is correct, then this would indeed be more than
required.  But of course, it would work so long as the SELECTs are wrapped
along with their resultant UPDATEs.


>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Consistency, rollback and such

2016-09-20 Thread Kevin O'Gorman
I think I understand the basics of SQL and ACID properties, but I'm new to
SQLite and not really experienced in any of these.  So I'm having some
trouble figuring out the detailed consequences of IMMEDIATE, EXCLUSIVE and
DEFERRED and the autocommit mode of python's sqlite3.

I expect my transactions to be fairly short, and to average three or less
per second, so conflict is likely, but not likely to overload whatever
mechanisms are involved.  However, it will be very common for a transaction
to begin with SELECT queries, and to compose an UPDATE based on what it has
and what it finds.  It will be quite possible, even frequent, for multiple
processes to decide to update the same records.

As background, I'd like to know what happens with each of the kinds of
transaction.  Do any of them do rollbacks, and if so what does that look
like (in Python),  Do any of them throw exceptions?

I'm guessing I'm going to want one of IMMEDIATE or EXCLUSIVE, but I'm not
sure which one.  I'm also wondering if setting
  conn = sqlite3.connect("mydb", isolation_level=IMMEDIATE)
does what I need.  Reading the docs, it would appear this does not start a
transaction until the UPDATE, and I think I want the transactions to start
before the first SELECT.  Should I instead do
  c = conn.cursor()
  c.execuite("BEGIN TRANSACTION IMMEDIATE")

and is IMMEDIATE the right thing, or do I need EXCLUSIVE.

That's a bunch of questions, so please answer any where you're _sure_ you
know the answer.


-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Clarification on "No Isolation On Same Db Connection"

2016-09-08 Thread Kevin O'Gorman
It seems to me that the simplest, most portable approach for this sort of
thing would to be having the SELECT create a temporary table of the desired
actions, and not apply them until after the select has concluded.  This
would work in any database -- it does not depend on precise semantics of
WAL, for instance.

Of course, it could be that this is inefficient for some reason, and that
might cause you to take a different course, but all the alternatives are
going to be more troublesome to understand and maintain.  It's up to you
whether this is worth it.

On Wed, Sep 7, 2016 at 7:24 PM, Igor Tandetnik  wrote:

> On 9/7/2016 6:11 PM, Stephan Mueller wrote:
>
>> I understand that a way to ensure "SELECT is unperturbed" semantics is to
>> use separate connections for SELECT and updates.
>>
>
> If you go down that route, make sure you are using WAL journaling mode; it
> won't work otherwise.
>
> This is undesirable since I'd have to (IIUC) do all my updates (possibly
>> millions) in a single transaction.
>>
>
> I don't see how this follows.
>
> I'd prefer to commit after each update
>>
>
> You can't commit on a single connection either, while there's an
> unfinalized SELECT statement traversal going on. So you aren't gaining
> anything by trying to interleave SELECT and updates on the same connection.
>
> That is, if I ever receive a record that ought to have arrived earlier
>> because of ORDER BY, it must be a since-SELECT-began update, and should be
>> ignored.
>>
>
> When data is modified under SELECT's feet, phantom rows are just one
> problem; it's also possible for the statement to skip rows it would have
> otherwise returned, and to return rows containing stale data. Basically,
> undefined behavior is undefined.
> --
> Igor Tandetnik
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Setting temp location with Python

2016-08-11 Thread Kevin O'Gorman
On Wed, Aug 10, 2016 at 6:50 AM, Jonathan Moules <
jonathan-li...@lightpear.com> wrote:

> Hi List,
>I'm using Python's sqlite3 library to access a SQLite db. I'd like to
> set the location for the temporary databases in a platform agnostic fashion
> (*nix or Windows).
>
> This page - https://www.sqlite.org/tempfiles.html - gives a number of
> options, but the only good one I can see for using it via Python would be
> the PRAGMA.
> But the docs for PRAGMA temp_store_directory; (
> https://www.sqlite.org/pragma.html#pragma_temp_store_directory ) say this
> is deprecated.
>
> So what's the recommended way to achieve this?
>
>
> Take a look at https://www.sqlite.org/tempfiles.html, specifically item
5.0.  which addresses this question.  It varies a bit depending on your
environment, so there's no single answer.  On my Linux system, I fiddle
with the SQLITE_TMPDIR environment variable.

You might want to describe your environment so that folks with a similar
environment can respond more helpfully.

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug in CREATE INDEX

2016-08-08 Thread Kevin O'Gorman
On Sun, Aug 7, 2016 at 11:11 PM, Dan Kennedy <danielk1...@gmail.com> wrote:

> On 08/08/2016 02:03 AM, Dominique Pellé wrote:
>
>> Kevin O'Gorman wrote:
>>
>> CREATE INDEX has two problems:
>>> 1) poor default location of temporary storage.
>>> 2) gets wedged on very large indexes.
>>>
>>> I'm using the sqlite that came with Xubuntu 14.04, I think it's version
>>> 3.8.2.
>>>
>> SQLite-3.8.2 is old (Dec 2013). It's better to download and
>> compile SQLite yourself.
>>
>> There has been several optimizations since that release.
>> In particular, looking at release notes at
>> https://sqlite.org/changes.html the following
>> improvement which may be relevant for your issue:
>>
>> === BEGIN QUOTE https://sqlite.org/changes.html ===
>> 3.8.10:
>>
>> Performance improvements for ORDER BY, VACUUM,
>> CREATE INDEX, PRAGMA integrity_check, and
>> PRAGMA quick_check.
>> === END QUOTE ===
>>
>
> 3.8.7 introduced the multi-threaded sorter too. So with a more recent
> version of SQLite,
>
>   PRAGMA threads = 4
>
> might help this case.
>
>   https://sqlite.org/pragma.html#pragma_threads
>
> Dan.
>

Very cool.  But 4?  I will be running this on machines with 8 and 16
cores.  Does going beyond 4 not help much?


-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug in CREATE INDEX

2016-08-08 Thread Kevin O'Gorman
On Mon, Aug 8, 2016 at 2:41 AM, Philip Newton <philip.new...@pobox.com>
wrote:

> On 7 August 2016 at 22:37, Kevin O'Gorman <kevinogorm...@gmail.com> wrote:
> > I use the LTS (long-term support) version of Ubuntu, and like not having
> to
> > keep up with all the latest.  My current 14.04 is at end-of-life
>
> LTS are supported for 5 years; your 14.04 is good till April 2019.
>
> Ph.
>

I guess I'm not paying attention -- that's the point, I don't want to have
to -- and just consider it end-of-life when it starts bugging me twice a
day about the next LTS.  I'm planning to install 16.04.1 into the partition
with the even older 12.04 today.

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug in CREATE INDEX

2016-08-07 Thread Kevin O'Gorman
On Sun, Aug 7, 2016 at 12:03 PM, Dominique Pellé <dominique.pe...@gmail.com>
wrote:

> Kevin O'Gorman wrote:
>
> > CREATE INDEX has two problems:
> > 1) poor default location of temporary storage.
> > 2) gets wedged on very large indexes.
> >
> > I'm using the sqlite that came with Xubuntu 14.04, I think it's version
> > 3.8.2.
>
> SQLite-3.8.2 is old (Dec 2013). It's better to download and
> compile SQLite yourself.
>
> There has been several optimizations since that release.
> In particular, looking at release notes at
> https://sqlite.org/changes.html the following
> improvement which may be relevant for your issue:
>
> === BEGIN QUOTE https://sqlite.org/changes.html ===
> 3.8.10:
>
> Performance improvements for ORDER BY, VACUUM,
> CREATE INDEX, PRAGMA integrity_check, and
> PRAGMA quick_check.
> === END QUOTE ===
>
> Regards
> Dominique
>
>
I use the LTS (long-term support) version of Ubuntu, and like not having to
keep up with all the latest.  My current 14.04 is at end-of-life, and I'll
upgrade to 16.04 soon -- probably this weekend -- and get sqlite 3.8.17
automatically.

I'm a hobbyist going solo with a lot of interests.  This means I have to
accept the concept of "good enough" or pare down my interests to those I
can devote admin time to on a regular basis. Even more so when I haven't
really decided to adopt the package yet.

Works for me, but YMMV.



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug in CREATE INDEX

2016-08-07 Thread Kevin O'Gorman
RETRACTED: Not wedged, just somewhat slower than I expected.  I was
expecting about an hour, and it takes 6.  On reflection, it's okay.  My
ideas about what's going on were wrong too -- anonymous files are used, but
only two of them at a time.

I still think it is using a poor choice of default directory for temporary
files.  Is it /var/tmp instead of the more usual /tmp?  That's the only
other choice that would not be just plain wrong, and maybe it was chosen
thinking that it's likely to be on a bigger partition.  Maybe, sometimes.
But a lot of the time, /var/tmp is on the same partition as /tmp, or just
ignored because the sort utility defaults to /tmp and often that's the
biggest user of big temporary files.  That's the reason that I've placed
/tmp on its own huge partition (3 TB) and left /var/tmp alone.  The big
reason to use /var/tmp is that the files there are not deleted
automatically -- the standard says they're to be preserved -- but anonymous
files aren't ever preserved -- they've disappeared from the file system by
definition.

In all, not a big issue since $TMPDIR is honored, but I don't see that it's
documented.  I just guessed it since the sort utility honors it and I
thought it was possible sort was being used under the covers.  It's not,
but it all worked out okay.

Does anybody know where the actual defaults and controlling environment
variables are documented, by operating system?  Or are they?

On Fri, Aug 5, 2016 at 12:36 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> CREATE INDEX has two problems:
> 1) poor default location of temporary storage.
> 2) gets wedged on very large indexes.
>
> I'm using the sqlite that came with Xubuntu 14.04, I think it's version
> 3.8.2.
>
> I created a table, and used .import to populate it with records, about 1.4
> billion of them.  The resulting database is 232 GB.  All seemed well.
>
> I then went to CREATE INDEX on two fields, one CHAR(64) and the other an
> INT.  This initially died quickly, but I determined that it was because it
> was attempting to use my smallest partition for workspace.  It was in my
> root partition, but not in /tmp which is separate.  I submit that /tmp
> would be the natural choice, as it is what the sort command uses by
> default.  That's problem 1.
>
> Fortunately, it honors the $TMPDIR environment variable, but gets wedged
> after a bit.  I have an idea why.  While it was still actively adding
> space, there were no filenames showing, so I infer that it's using
> "anonymous" files, perhaps by calling tmpfile().  This could be bad news,
> as anonymous files have to be held open, but there are limits on how many
> open files any process is allowed.  Unless your merge process starts with
> at least 1.4 million keys in each buffer, this is going to fail on my
> data.  I suggest using tempnam() and closing the files when you're not
> actively using them.
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-07 Thread Kevin O'Gorman
On Sat, Aug 6, 2016 at 2:49 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> On Sat, Aug 6, 2016 at 2:09 AM, Dan Kennedy <danielk1...@gmail.com> wrote:
>
>> On 08/06/2016 09:52 AM, Kevin O'Gorman wrote:
>>
>>> On Fri, Aug 5, 2016 at 2:03 PM, Dan Kennedy <danielk1...@gmail.com>
>>> wrote:
>>>
>>> On 08/06/2016 03:28 AM, Kevin O'Gorman wrote:
>>>>
>>>> On Fri, Aug 5, 2016 at 1:08 PM, David Raymond <david.raym...@tomtom.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>> ..
>>>>>
>>>>> Apart from the default location of the files, it reads like your next
>>>>> main
>>>>>
>>>>>> concern is how many temp files get opened up. My bet is that it'll be
>>>>>> a
>>>>>> very small number, just potentially huge in file size while it's doing
>>>>>> its
>>>>>> thing. But again, try that pragma and take a look.
>>>>>>
>>>>>> My best bet is the contrary:  it starts with small files and makes
>>>>>>
>>>>> increasingly larger ones, like the sort utility does.  The problem is
>>>>> that
>>>>> there are too many of them at the beginning for it to work with
>>>>> anonymous
>>>>> files (which sort does not use).  This at least offers a possible
>>>>> explanation of its getting wedged on large indexes: an unexpected and
>>>>> untested error, handled poorly.
>>>>>
>>>>> You could verify this by checking the number of open handles in
>>>> "/proc//fd" after your process is wedged.
>>>>
>>>> Excellent idea.  I did not know about that possibility.  And sure
>>>> enough,
>>>>
>>> I'm wrong.  It's using anonymous files, all right, but only one or two
>>> at a
>>> time.  I assume they're big.  I'm in the process of bracketing where size
>>> begins to matter.  So far, 1/10 of the data loads and indexes just fine,
>>> albeit somewhat more slowly that the smaller samples predicted.  The
>>> database load took 6.5 minutes, the troublesome index 10 minutes.  At
>>> smaller sizes, indexing is faster than the database load.
>>>
>>> I'm trying 1/3 now (500 million lines)
>>>
>>
>> What does [top] tell you once the process becomes wedged? What percentage
>> is the CPU running at? Or is it completely bogged down waiting for IO?
>>
>> Dan
>
>
> I'm waiting for a good time to get the answer to this.  It takes a good
> long while to get to wedged, so I'll probably do it overnight tonight.
>

RETRACTION: it doesn't get wedged after all, it just takes about 4 times
longer than I expected.  On small inputs, it makes the indexes faster than
the data table.  When I test on the whole data file, it takes 4 times
longer to build each index than it took to make the table.  I guess that's
what it looks like when building the table is O(n) complexity (linear), and
the index is O(n * log(n)) and log(n) is getting bigger.

Sorry for the kerfluffle.  This is my first time working with datasets this
big.  Since it takes almost 6 hours to build each index (there are 2 at
this point), and I didn't know to expect that, I drew the wrong
conclusion.  My bad.

I'm just glad Mr. Hipp got into this thread and advised to build the
indexes after the data is loaded.  It would have been _lots_ slower
inserting each index individually, and in fact my first attempt -- the one
that gave me the initial idea that this had wedged -- took this approach
and I decided it was wedged after a few _days_.  I suspect it wasn't
either, but might as well have been.

So far, this has been a feasibility exercise.  Building a 500 GB database
overnight is definitely feasible.  Now I can go on to see if it helps me
solve my problem.


-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-06 Thread Kevin O'Gorman
On Sat, Aug 6, 2016 at 2:09 AM, Dan Kennedy <danielk1...@gmail.com> wrote:

> On 08/06/2016 09:52 AM, Kevin O'Gorman wrote:
>
>> On Fri, Aug 5, 2016 at 2:03 PM, Dan Kennedy <danielk1...@gmail.com>
>> wrote:
>>
>> On 08/06/2016 03:28 AM, Kevin O'Gorman wrote:
>>>
>>> On Fri, Aug 5, 2016 at 1:08 PM, David Raymond <david.raym...@tomtom.com>
>>>> wrote:
>>>>
>>>> ..
>>>>
>>>> Apart from the default location of the files, it reads like your next
>>>> main
>>>>
>>>>> concern is how many temp files get opened up. My bet is that it'll be a
>>>>> very small number, just potentially huge in file size while it's doing
>>>>> its
>>>>> thing. But again, try that pragma and take a look.
>>>>>
>>>>> My best bet is the contrary:  it starts with small files and makes
>>>>>
>>>> increasingly larger ones, like the sort utility does.  The problem is
>>>> that
>>>> there are too many of them at the beginning for it to work with
>>>> anonymous
>>>> files (which sort does not use).  This at least offers a possible
>>>> explanation of its getting wedged on large indexes: an unexpected and
>>>> untested error, handled poorly.
>>>>
>>>> You could verify this by checking the number of open handles in
>>> "/proc//fd" after your process is wedged.
>>>
>>> Excellent idea.  I did not know about that possibility.  And sure enough,
>>>
>> I'm wrong.  It's using anonymous files, all right, but only one or two at
>> a
>> time.  I assume they're big.  I'm in the process of bracketing where size
>> begins to matter.  So far, 1/10 of the data loads and indexes just fine,
>> albeit somewhat more slowly that the smaller samples predicted.  The
>> database load took 6.5 minutes, the troublesome index 10 minutes.  At
>> smaller sizes, indexing is faster than the database load.
>>
>> I'm trying 1/3 now (500 million lines)
>>
>
> What does [top] tell you once the process becomes wedged? What percentage
> is the CPU running at? Or is it completely bogged down waiting for IO?
>
> Dan


I'm waiting for a good time to get the answer to this.  It takes a good
long while to get to wedged, so I'll probably do it overnight tonight.


-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-05 Thread Kevin O'Gorman
On Fri, Aug 5, 2016 at 2:03 PM, Dan Kennedy <danielk1...@gmail.com> wrote:

> On 08/06/2016 03:28 AM, Kevin O'Gorman wrote:
>
>> On Fri, Aug 5, 2016 at 1:08 PM, David Raymond <david.raym...@tomtom.com>
>> wrote:
>>
>> ..
>>
>> Apart from the default location of the files, it reads like your next main
>>> concern is how many temp files get opened up. My bet is that it'll be a
>>> very small number, just potentially huge in file size while it's doing
>>> its
>>> thing. But again, try that pragma and take a look.
>>>
>>> My best bet is the contrary:  it starts with small files and makes
>> increasingly larger ones, like the sort utility does.  The problem is that
>> there are too many of them at the beginning for it to work with anonymous
>> files (which sort does not use).  This at least offers a possible
>> explanation of its getting wedged on large indexes: an unexpected and
>> untested error, handled poorly.
>>
>
> You could verify this by checking the number of open handles in
> "/proc//fd" after your process is wedged.
>
> Excellent idea.  I did not know about that possibility.  And sure enough,
I'm wrong.  It's using anonymous files, all right, but only one or two at a
time.  I assume they're big.  I'm in the process of bracketing where size
begins to matter.  So far, 1/10 of the data loads and indexes just fine,
albeit somewhat more slowly that the smaller samples predicted.  The
database load took 6.5 minutes, the troublesome index 10 minutes.  At
smaller sizes, indexing is faster than the database load.

I'm trying 1/3 now (500 million lines)

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-05 Thread Kevin O'Gorman
On Fri, Aug 5, 2016 at 3:03 PM, Darren Duncan 
wrote:

> On 2016-08-04 7:27 AM, Jim Callahan wrote:
>
>> Steps
>> Agree with Darren Duncan and Dr. Hipp you may want to have at least 3
>> separate steps
>> (each step should be a separate transaction):
>>
>> 1. Simple load
>> 2. Create additional column
>> 3. Create index
>>
>> Have you pre-defined the table you are loading data into? (step 0 CREATE
>> TABLE)
>>
>> If "Step 1 Simple Load" does not complete; then may want to load a fixed
>> number of rows into separate tables (per Darren Duncan)  and then combine
>> using an APPEND
>> or a UNION query (doing so before steps 2 and 3).
>>
>
> To be clear, my proposal of only loading a subset of rows was just
> intended for debugging the performance issues the OP was having and try and
> figure out what is causing the problem without having to wait too long for
> larger sets to complete.  I wasn't proposing splitting the load into
> separate databases and unioning later, rather the attempt database would be
> thrown away after each trial. -- Darren Duncan
>

Thanks, I understood that and it was a good set of suggestions.  Using
separate transactions led to focusing on the CREATE INDEX part of the
process.  That's where it gets wedged.  Working up to the full database, by
first processing smaller extracts, convinced me that size matters.  Smaller
(even millions of row) extracts do not wedge.

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-05 Thread Kevin O'Gorman
On Fri, Aug 5, 2016 at 1:08 PM, David Raymond <david.raym...@tomtom.com>
wrote:

> There's a depreciated pragma, PRAGMA temp_store_directory =
> 'directory-name'; that apears to still work ok for now at least.
> http://www.sqlite.org/pragma.html#pragma_temp_store_directory
>
> I've used it for the same reasons you've both mentioned about space,
> though for me it vacuuming a huge db, and the full size db copy that makes
> which ate up my entire C drive. But with that pragma you can specify
> exactly which folder you want to use for your temp files. I'd suggest try
> using that, then monitor the folder you give it to see what shows up.
>

Since it's deprecated, I'll stick with TMPDIR because it's pretty standard
on Linux (it also works with the sort utility for instance).


> Apart from the default location of the files, it reads like your next main
> concern is how many temp files get opened up. My bet is that it'll be a
> very small number, just potentially huge in file size while it's doing its
> thing. But again, try that pragma and take a look.
>

My best bet is the contrary:  it starts with small files and makes
increasingly larger ones, like the sort utility does.  The problem is that
there are too many of them at the beginning for it to work with anonymous
files (which sort does not use).  This at least offers a possible
explanation of its getting wedged on large indexes: an unexpected and
untested error, handled poorly.


>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Kevin O'Gorman
> Sent: Friday, August 05, 2016 3:41 PM
> To: SQLite mailing list
> Subject: Re: [sqlite] newbie has waited days for a DB build to complete.
> what's up with this.
>
> On Fri, Aug 5, 2016 at 12:30 PM, Igor Korot <ikoro...@gmail.com> wrote:
>
> > Hi, Kevin,
> >
> > On Fri, Aug 5, 2016 at 3:18 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
> > wrote:
> > > Okay, I followed some of the advice y'all gave and got some results.
> > >
> > > 1. The original problem was compromised by malformed input.  However,
> it
> > > appears that did not cause the wedging of the process.  See (3) below.
> >
> > Where are the data will come?
> > From the user? Internet?
> >
> > What I'm getting at is - you need to look for some malformed data in
> > the future as well.
> >
>
> I generate it.  I goofed, and I'll try not to goof in the future.
>
>
> >
> > >
> > > 2. I separated the steps, and started small.  Time increased slightly
> > > sub-linearly with dataset size, so I jumped to doing the whole thing.
> > With
> > > proper input, the data was loaded in 68 minutes.
> > >
> > > 3. The CREATE INDEX steps failed quickly (2 minutes), reporting
> "database
> > > or disk is full" which seemed odd since most of my partitions have much
> > > more free space than the entire database.  It turns out that whatever
> > does
> > > the creation was using space on my root partition (this is Linux, so
> that
> > > means "/").  That's the only partition in my setup without a huge
> amount
> > of
> > > free space.  On would expect temporary stuff to go to /tmp (which has
> 3TB
> > > free), but it doesn't go there.  I would go there if the system's
> native
> > > "sort" program were used.  Fortunately, it turns out that the TMPDIR
> > > environment variable is honored, but while I could see space was being
> > > used, there were no files visible.  I take that to mean that the
> > tmpfile()
> > > function (or equivalent) was used.  This could be a bad idea for large
> > > indexes because anonymous files have to be kept open, and there's a
> limit
> > > on the number of files that can be open at a time, around 1,000.  Sure
> > > enough, the index creation appears to be wedged like the original run,
> > and
> > > after a few hours I killed it manually.  This is a deal-killer.
> >
> > The failure you saw - is it on the table with the complete data set?
> > Or you got it during the experimenting?
> >
> > Only on the complete data set.
>
>
> > >
> > > So the questions are: Where do bug reports go?  I seem to be running
> > 3.8.2;
> > > is this fixed in any later version?
> >
> > You can try the "3.14" pre-released one right now. ;-)
> >
>
> Meh.  I submitted a bug report to this list.  I'll see what happens.
>
>
> > Thank you.
> >
> > >
> > >
> > > On Thu, Aug 4, 2016 at 9:27 AM, 

Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-05 Thread Kevin O'Gorman
On Fri, Aug 5, 2016 at 12:30 PM, Igor Korot <ikoro...@gmail.com> wrote:

> Hi, Kevin,
>
> On Fri, Aug 5, 2016 at 3:18 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
> wrote:
> > Okay, I followed some of the advice y'all gave and got some results.
> >
> > 1. The original problem was compromised by malformed input.  However, it
> > appears that did not cause the wedging of the process.  See (3) below.
>
> Where are the data will come?
> From the user? Internet?
>
> What I'm getting at is - you need to look for some malformed data in
> the future as well.
>

I generate it.  I goofed, and I'll try not to goof in the future.


>
> >
> > 2. I separated the steps, and started small.  Time increased slightly
> > sub-linearly with dataset size, so I jumped to doing the whole thing.
> With
> > proper input, the data was loaded in 68 minutes.
> >
> > 3. The CREATE INDEX steps failed quickly (2 minutes), reporting "database
> > or disk is full" which seemed odd since most of my partitions have much
> > more free space than the entire database.  It turns out that whatever
> does
> > the creation was using space on my root partition (this is Linux, so that
> > means "/").  That's the only partition in my setup without a huge amount
> of
> > free space.  On would expect temporary stuff to go to /tmp (which has 3TB
> > free), but it doesn't go there.  I would go there if the system's native
> > "sort" program were used.  Fortunately, it turns out that the TMPDIR
> > environment variable is honored, but while I could see space was being
> > used, there were no files visible.  I take that to mean that the
> tmpfile()
> > function (or equivalent) was used.  This could be a bad idea for large
> > indexes because anonymous files have to be kept open, and there's a limit
> > on the number of files that can be open at a time, around 1,000.  Sure
> > enough, the index creation appears to be wedged like the original run,
> and
> > after a few hours I killed it manually.  This is a deal-killer.
>
> The failure you saw - is it on the table with the complete data set?
> Or you got it during the experimenting?
>
> Only on the complete data set.


> >
> > So the questions are: Where do bug reports go?  I seem to be running
> 3.8.2;
> > is this fixed in any later version?
>
> You can try the "3.14" pre-released one right now. ;-)
>

Meh.  I submitted a bug report to this list.  I'll see what happens.


> Thank you.
>
> >
> >
> > On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <kevinogorm...@gmail.com>
> > wrote:
> >
> >> The metric for feasability is coding ease, not runtime.  I'm the
> >> bottleneck, not the machine, at least at this point.
> >>
> >> As for adding rows, it will be about like this time: a billion or so at
> a
> >> time.  But there's no need to save the old data.  Each round can be
> >> separate except for a persistent "solutions" table of much more modest
> >> size.  I've been doing this for a while now, and the solutions file has
> >> only 10 million or so lines, each representing a game position for which
> >> optimum moves are known.  Getting this file to include the starting
> >> position is the point of the exercise.
> >>
> >> If I ever get to anything like "production" in this project, I expect it
> >> to run for maybe three years...  That's after I tweak it for speed.
> >>
> >> Background: in production, this will be running on a dual-Xeon with 16
> >> cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file update
> >> through Linux flock() calls at the moment.  The code is bash gluing
> >> together a collection of UNIX utilities and some custom C code.  The C
> is
> >> kept as simple as possible, to minimize errors.
> >>
> >> As you may surmise, this "hobby" is important to me.
> >>
> >>
> >> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote:
> >>
> >>>
> >>>
> >>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
> >>>
> >>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <
> ddevie...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>
> >>>> It's even less dense than that.  Each character has only 3 possible
> >>>> values,
> >>>> and thus it's pretty easy to compress down to 2 bits each, for a 16
> byte
> >>>> blob.
> >>&g

[sqlite] Bug in CREATE INDEX

2016-08-05 Thread Kevin O'Gorman
CREATE INDEX has two problems:
1) poor default location of temporary storage.
2) gets wedged on very large indexes.

I'm using the sqlite that came with Xubuntu 14.04, I think it's version
3.8.2.

I created a table, and used .import to populate it with records, about 1.4
billion of them.  The resulting database is 232 GB.  All seemed well.

I then went to CREATE INDEX on two fields, one CHAR(64) and the other an
INT.  This initially died quickly, but I determined that it was because it
was attempting to use my smallest partition for workspace.  It was in my
root partition, but not in /tmp which is separate.  I submit that /tmp
would be the natural choice, as it is what the sort command uses by
default.  That's problem 1.

Fortunately, it honors the $TMPDIR environment variable, but gets wedged
after a bit.  I have an idea why.  While it was still actively adding
space, there were no filenames showing, so I infer that it's using
"anonymous" files, perhaps by calling tmpfile().  This could be bad news,
as anonymous files have to be held open, but there are limits on how many
open files any process is allowed.  Unless your merge process starts with
at least 1.4 million keys in each buffer, this is going to fail on my
data.  I suggest using tempnam() and closing the files when you're not
actively using them.

-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-05 Thread Kevin O'Gorman
Okay, I followed some of the advice y'all gave and got some results.

1. The original problem was compromised by malformed input.  However, it
appears that did not cause the wedging of the process.  See (3) below.

2. I separated the steps, and started small.  Time increased slightly
sub-linearly with dataset size, so I jumped to doing the whole thing.  With
proper input, the data was loaded in 68 minutes.

3. The CREATE INDEX steps failed quickly (2 minutes), reporting "database
or disk is full" which seemed odd since most of my partitions have much
more free space than the entire database.  It turns out that whatever does
the creation was using space on my root partition (this is Linux, so that
means "/").  That's the only partition in my setup without a huge amount of
free space.  On would expect temporary stuff to go to /tmp (which has 3TB
free), but it doesn't go there.  I would go there if the system's native
"sort" program were used.  Fortunately, it turns out that the TMPDIR
environment variable is honored, but while I could see space was being
used, there were no files visible.  I take that to mean that the tmpfile()
function (or equivalent) was used.  This could be a bad idea for large
indexes because anonymous files have to be kept open, and there's a limit
on the number of files that can be open at a time, around 1,000.  Sure
enough, the index creation appears to be wedged like the original run, and
after a few hours I killed it manually.  This is a deal-killer.

So the questions are: Where do bug reports go?  I seem to be running 3.8.2;
is this fixed in any later version?


On Thu, Aug 4, 2016 at 9:27 AM, Kevin O'Gorman <kevinogorm...@gmail.com>
wrote:

> The metric for feasability is coding ease, not runtime.  I'm the
> bottleneck, not the machine, at least at this point.
>
> As for adding rows, it will be about like this time: a billion or so at a
> time.  But there's no need to save the old data.  Each round can be
> separate except for a persistent "solutions" table of much more modest
> size.  I've been doing this for a while now, and the solutions file has
> only 10 million or so lines, each representing a game position for which
> optimum moves are known.  Getting this file to include the starting
> position is the point of the exercise.
>
> If I ever get to anything like "production" in this project, I expect it
> to run for maybe three years...  That's after I tweak it for speed.
>
> Background: in production, this will be running on a dual-Xeon with 16
> cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file update
> through Linux flock() calls at the moment.  The code is bash gluing
> together a collection of UNIX utilities and some custom C code.  The C is
> kept as simple as possible, to minimize errors.
>
> As you may surmise, this "hobby" is important to me.
>
>
> On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote:
>
>>
>>
>> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
>>
>>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <ddevie...@gmail.com>
>>> wrote:
>>>
>>>
>>> It's even less dense than that.  Each character has only 3 possible
>>> values,
>>> and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
>>> blob.
>>> It's just hard to do that without a bunch of SQLite code I'd have to
>>> learn
>>> how to write.  The current effort amounts to a feasibility study, and I
>>> want
>>> to keep it as simple as possible.
>>>
>>
>> A feasibility study using equipment that are hamstrung by weights they
>> won't have in the real situation is not an accurate study.
>>
>> It's like studying fuel consumption on a different kind of road surface,
>> but for the test purposes, the cars had to tow caravans containing their
>> testing equipment - the study will not look feasible at all.
>>
>> It might of course be that the feasibility you are studying is completely
>> unrelated to the data handling - in which case the point is moot.
>>
>> Let us know how it goes :)
>> Ryan
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Kevin O'Gorman
The metric for feasability is coding ease, not runtime.  I'm the
bottleneck, not the machine, at least at this point.

As for adding rows, it will be about like this time: a billion or so at a
time.  But there's no need to save the old data.  Each round can be
separate except for a persistent "solutions" table of much more modest
size.  I've been doing this for a while now, and the solutions file has
only 10 million or so lines, each representing a game position for which
optimum moves are known.  Getting this file to include the starting
position is the point of the exercise.

If I ever get to anything like "production" in this project, I expect it to
run for maybe three years...  That's after I tweak it for speed.

Background: in production, this will be running on a dual-Xeon with 16
cores (32 hyperthreads) and 1/4 TiB RAM.  It has sequential file update
through Linux flock() calls at the moment.  The code is bash gluing
together a collection of UNIX utilities and some custom C code.  The C is
kept as simple as possible, to minimize errors.

As you may surmise, this "hobby" is important to me.


On Thu, Aug 4, 2016 at 9:09 AM, R Smith <rsm...@rsweb.co.za> wrote:

>
>
> On 2016/08/04 5:56 PM, Kevin O'Gorman wrote:
>
>> On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <ddevie...@gmail.com>
>> wrote:
>>
>>
>> It's even less dense than that.  Each character has only 3 possible
>> values,
>> and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
>> blob.
>> It's just hard to do that without a bunch of SQLite code I'd have to learn
>> how to write.  The current effort amounts to a feasibility study, and I
>> want
>> to keep it as simple as possible.
>>
>
> A feasibility study using equipment that are hamstrung by weights they
> won't have in the real situation is not an accurate study.
>
> It's like studying fuel consumption on a different kind of road surface,
> but for the test purposes, the cars had to tow caravans containing their
> testing equipment - the study will not look feasible at all.
>
> It might of course be that the feasibility you are studying is completely
> unrelated to the data handling - in which case the point is moot.
>
> Let us know how it goes :)
> Ryan
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Kevin O'Gorman
On Thu, Aug 4, 2016 at 8:29 AM, Dominique Devienne <ddevie...@gmail.com>
wrote:

> On Thu, Aug 4, 2016 at 5:05 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
> wrote:
>
> > 3. Positions are 64 bytes always, so your size guesses are right.  They
> are
> > in no particular order.  I like the suggestion of a separate position
> > table, because they're going to appear in multiple qmove records, with an
> > average of about 3 or 4 appearances I think.  Maybe more.
> >
>
> 3x or 4x duplication doesn't sound like a lot. What you'd gain in the moves
> table,
> you'd lose having to join to the positions table I suspect. Higher level
> duplicates, maybe.
>
> 64-bytes always? Is that a human readable string, or some 'code' in
> hexadecimal?
> If the latter, use a blob, which requires only 32-bytes to store the same
> info. You can
> use the blob literal notation x'abcdef01' (that's a 4-bytes blob).
>

It's even less dense than that.  Each character has only 3 possible values,
and thus it's pretty easy to compress down to 2 bits each, for a 16 byte
blob.
It's just hard to do that without a bunch of SQLite code I'd have to learn
how to write.  The current effort amounts to a feasibility study, and I want
to keep it as simple as possible.


> Finally, note that if your program writes a huge text file with all your
> values, that you
> .import into sqlite3 as you showed, you're IMHO wasting time, since you
> can't use
> prepared statements and binds, and you also force SQLite's SQL parser to
> parse
> a huge amount of text. By embedding SQLite into your generator program, you
> remove all parsing except for a trivial "insert into qmoves values (:1, :2,
> ...)", and
> all the rest is sqlite_bind*() and co. calls. (and if blob situation for
> positions, then
> you can bind the 32-bytes blob directly, no need to convert/parse to/from
> hex).
>

I understand the concept of prepared statements in principle, sort of,
don't how
binds work really, so I'm not quite ready to write the code you allude to.
And I'd
no longer be able to use sqlite3 at all to do simple experiments -- I'd
have to
be blobbing and de-blobbing to make sense of anything.


>
> My $0.02. --DD
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-04 Thread Kevin O'Gorman
Lots of answers, so thanks all around.  Some more info:

1. All partitions have at least 3 GB free, and it's not changing.  /tmp is
3 TiB and empty.
2. I have a RAID partition, for size, but no RAID controller.  As a hobby
project, I don't have spare parts, and I fear the results of a failure of a
hardware RAID without a spare, so I use Linux mdadm to manage software RAID
across three 4-TB drives.
3. Positions are 64 bytes always, so your size guesses are right.  They are
in no particular order.  I like the suggestion of a separate position
table, because they're going to appear in multiple qmove records, with an
average of about 3 or 4 appearances I think.  Maybe more.

I'm going to retry, using some of the suggestions above: smaller
transactions, IGNORE, position table rowids in the moves table, smaller
subsets being loaded, developing ideas of how time depends on data size.

If it doesn't go well fairly quickly, I'll probably go back to flat files
and writing the queries the hard way.  At least I know what I'm dealing
with there.

QUESTIONS:
If it's best in general to build indexes as a separate step, does this also
apply to primary indexes?  Can a table without a primary index have one
added later?  Isn't ROWID the real primary, presuming it has ROWIDs?  And
if so, then is a primary index on a ROWID table just for compliance with
standard SQL, and really no better than any other index?  Obviously, I'm a
bit confused about this.

While I'm at it, I may as well ask if ROWID has any physical significance,
such that a VACUUM operation might change it.  Or is it just an arbitrary
ID inserted by SQLite and added to each record when they exist at all.

The current dataset is intended to solve one particular issue in the
overall project.  It looks like I'd want to build each such dataset
separately, as there will likely be a few hundred, and I gather that adding
to these tables will be pretty slow once the indexes have been built.  Or
is it sensible to drop indexes, add data and rebuild?

On Thu, Aug 4, 2016 at 7:27 AM, Jim Callahan <jim.callahan.orla...@gmail.com
> wrote:

> Temp Files
> Have you checked how much storage is available to the temporary file
> locations?
> The temporary file locations are different depending on the OS, build, VFS
> and PRAGMA settings.
> See the last section "5.0 Temporary File Storage Locations" of:
> https://www.sqlite.org/tempfiles.html
>
>
> The database was growing for about 1-1/2 days.  Then its journal
> > disappeared, the file size dropped to zero, but sqlite3 is still running
> > 100% CPU time, now for a total of 3800+ minutes (63+ hours).  The
> database
> > is still locked, but I have no idea what sqlite3 is doing, or if it will
> > ever stop.  All partitions still have lots of space left (most of this is
> > running in a RAID partition of 11 TiB).  Here's what I gave to sqlite3 on
> > my Linux system:
>
>
> You might have a huge storage allocation for the main file and log, but
> some other temp file might be being dumped
> to a more constrained storage location.
>
> RAM
> Since you are using RAID disk controller; I assume you have 64 bit CPU and
> more than 8 GB of RAM?
> If you have 8 GB or more of RAM would it help to use an in memory database?
>
> Transactions
> Are you using explicit or implicit transactions?
> https://www.sqlite.org/lang_transaction.html
>
>
> Steps
> Agree with Darren Duncan and Dr. Hipp you may want to have at least 3
> separate steps
> (each step should be a separate transaction):
>
> 1. Simple load
> 2. Create additional column
> 3. Create index
>
> Have you pre-defined the table you are loading data into? (step 0 CREATE
> TABLE)
>
> If "Step 1 Simple Load" does not complete; then may want to load a fixed
> number of rows into separate tables (per Darren Duncan)  and then combine
> using an APPEND
> or a UNION query (doing so before steps 2 and 3).
>
> HTH
>
> Jim Callahan
> Data Scientist
> Orlando, FL
>
>
>
>
> On Wed, Aug 3, 2016 at 11:00 PM, Kevin O'Gorman <kevinogorm...@gmail.com>
> wrote:
>
> > I'm working on a hobby project, but the data has gotten a bit out of
> hand.
> > I thought I'd put it in a real database rather than flat ASCII files.
> >
> > I've got a problem set of about 1 billion game positions and 187GB to
> work
> > on (no, I won't have to solve them all) that took about 4 hours for a
> > generator program just to write.  I wrote code to turn them into
> something
> > SQLite could import.  Actually, it's import, build a non-primary index,
> and
> > alter table to add a column, all in sqlite3.
> >
> > The database was growing for about 1-1/2 days.  Then its journal
> > disappeared, the file size dropped t

[sqlite] newbie has waited days for a DB build to complete. what's up with this.

2016-08-03 Thread Kevin O'Gorman
I'm working on a hobby project, but the data has gotten a bit out of hand.
I thought I'd put it in a real database rather than flat ASCII files.

I've got a problem set of about 1 billion game positions and 187GB to work
on (no, I won't have to solve them all) that took about 4 hours for a
generator program just to write.  I wrote code to turn them into something
SQLite could import.  Actually, it's import, build a non-primary index, and
alter table to add a column, all in sqlite3.

The database was growing for about 1-1/2 days.  Then its journal
disappeared, the file size dropped to zero, but sqlite3 is still running
100% CPU time, now for a total of 3800+ minutes (63+ hours).  The database
is still locked, but I have no idea what sqlite3 is doing, or if it will
ever stop.  All partitions still have lots of space left (most of this is
running in a RAID partition of 11 TiB).  Here's what I gave to sqlite3 on
my Linux system:

time sqlite3 qubic.db