Re: [sqlite] Deterministic random sampling via SELECT

2019-11-08 Thread Merijn Verstraaten
On 7 Nov 2019, at 20:47, Chris Peachment wrote: > 1. generate a list of pseudo-random numbers, using a pre-defined > seed value, over the range 1 .. count(*) of records in table, > > 2. use that list as record id values to select the desired subset > of the data in the table. > > This

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Richard Damon
On 11/7/19 5:13 PM, Doug Currie wrote: > On Thu, Nov 7, 2019 at 4:23 PM Richard Damon > wrote: > >> One thought would be to generate a ‘hash’ from part of the record, maybe >> the record ID, and select records based on that value. The simplest would >> be something like id%100 == 0 would get you

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Doug Currie
On Thu, Nov 7, 2019 at 4:23 PM Richard Damon wrote: > > One thought would be to generate a ‘hash’ from part of the record, maybe > the record ID, and select records based on that value. The simplest would > be something like id%100 == 0 would get you 1% of the records. That > admittedly isn’t

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Richard Damon
> On Nov 7, 2019, at 2:15 PM, Merijn Verstraaten wrote: > >  >> On 7 Nov 2019, at 19:16, David Raymond wrote: >> >> Along those lines SQLite includes the reverse_unordered_selects pragma >> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects >> which will flip the order it

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread David Raymond
ge- From: sqlite-users On Behalf Of Merijn Verstraaten Sent: Thursday, November 7, 2019 2:16 PM To: SQLite mailing list Subject: Re: [sqlite] Deterministic random sampling via SELECT > On 7 Nov 2019, at 19:16, David Raymond wrote: > > Along those lines SQLite includes the rev

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Chris Peachment
In the very old days before computers were common, a random number table appeared at the back of many statistical texts. This was used to select a series of random numbers which would then be used as look-up indices into some other data set. You could do the same: 1. generate a list of

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Donald Griggs
> > Regarding: "So far the only suggestion was "use some non-deterministic > random sampling method and store the result", but since my samples are > large and I have lots of them, this would balloon my storage by >100x and I > don't have the available storage to make that work." But Keith

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Merijn Verstraaten
> On 7 Nov 2019, at 19:16, David Raymond wrote: > > Along those lines SQLite includes the reverse_unordered_selects pragma > https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects > which will flip the order it sends rows in queries that don't explicitly > specify an ordering.

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread David Raymond
: [sqlite] Deterministic random sampling via SELECT On 7 Nov 2019, at 1:56pm, David Raymond wrote: > Others will correct me if I'm wrong on that. No correction, but I wanted to add something. According to the theory of how SQL (not just SQLite, SQL) works, tables have no order. You

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Simon Slavin
On 7 Nov 2019, at 1:56pm, David Raymond wrote: > Others will correct me if I'm wrong on that. No correction, but I wanted to add something. According to the theory of how SQL (not just SQLite, SQL) works, tables have no order. You can, in theory, query a table of 100 rows with SELECT a,b

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Keith Medcalf
On Thursday, 7 November, 2019 04:55, Merijn Verstraaten wrote: >I'm trying sample a (deterministically) random subset of a SELECT query, You cannot have something that is both RANDOM and DETERMINISTIC at the same time. >the most common solution on the internet to get random samples seems to

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread David Raymond
"So, is this behaviour documented/guaranteed somewhere?" Short version is: Nope. The engine is free to do whatever it wants as long as it gives the correct result in the end. Consider a simple select * from foo where predicate order by non_indexed_field; Since there is no nice ordering of the