Re: [sqlite] Changing ID's to UUID

2016-11-18 Thread James K. Lowden
On Fri, 18 Nov 2016 10:22:15 -0800
Jens Alfke  wrote:

> You?d have to assign a central ?count server? to hand out consecutive
> numbers

No.  You need only enumerate the sources, and accept that the key is
the unique counter from the source plus the centrally assigned source
identifier.  

> You can assign ?peer IDs?, but then you need a sure-fire-unique way
> to assign them (using the MAC address is not sure-fire-unique) 

The sure-fire way to assign them is simple enumeration when they connect
to the central server to upload their information.  If the "peer" will
ever need to retrieve the information, it will need to store its
assigned id.  Else it's just up to central to keep everything
straight.  

> the system has to keep a persistent counter that can?t ever be rolled
> back, even in the case of something like restoring an OS backup. 

That's just not a problem if each system is in control of its own
counter.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-18 Thread Jens Alfke

> On Nov 16, 2016, at 5:59 AM, Keith Medcalf  wrote:
> 
> What I do not understand is why one would use a UUID (randomly generated 
> bunch of bytes) as a key in a database.  It is long, every use must be 
> checked for collisions, and inherently far less efficient than the simple 
> integer sequence it is replacing.

Distributed databases such as CouchDB and Couchbase do this. (I work on 
Couchbase Mobile.) When a database record is to be replicated between any 
number of peer databases, it obviously becomes infeasible to assign it a simple 
auto-incremented integer. (You’d have to assign a central ‘count server’ to 
hand out consecutive numbers, and then your system becomes bottlenecked on that 
server; it limits the rate at which the entire system can create records, and 
becomes a single point of failure.)

You can assign ‘peer IDs’, but then you need a sure-fire-unique way to assign 
them (using the MAC address is not sure-fire-unique) and the system has to keep 
a persistent counter that can’t ever be rolled back, even in the case of 
something like restoring an OS backup. You can use a timestamp to try to avoid 
rollback problems, but now you’ve added the well-known problems of using clocks 
in a distributed system. Worse, you’ve started leaking a significant amount of 
interesting information about your computer, which in some cases can be a 
privacy problem.

In the face of all that, simply generating a sufficiently long 
cryptographically-random number is very attractive.

> THe problem is an inability to properly determine and assess risk.  When 
> using a sequence the probability of a collision is 0.  When using a random 
> generated number (passing a bunch of random data through a digest function) 
> has a probability of collision of 100%. 

Um. Either you’re trolling me, or you don’t have a good understanding of 
probability.

In any case, this thread is pretty far off-topic and we should stop arguing 
this. I will not respond to any further posts.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-17 Thread Richard Damon

On 11/17/16 6:10 AM, R Smith wrote:


On 2016/11/17 10:48 AM, J Decker wrote:

https://en.wikipedia.org/wiki/Birthday_problem

/.../
one duplicate. In other words, only after generating 1 billion UUIDs 
every

second for the next 100 years, the probability of creating just one
duplicate would be about 50%.


All correct, but sounding misleading... Your phrase tends to sound 
like "Only after 1 billion UUIDs / sec for 100 years do I have a 50% 
chance of getting a single duplicate...".
I think you miss the math. After 100 Years of generating a Billion 
UUIDs/sec, we have generated a rough order of 3x10^18 UUIDs, or about 
2^62 UUIDs. We have 2^128 possible UUIDs, so we are no where near 50% 
chance of a duplicate for each one, and the 50% chance quoted was the 
chance that in that 3x10^18 UUIDs created, there is one pair of 
duplicates. (I haven't done the math to verify that number, but it is 
about right, we have used about 1/2 the bits, which about the point you 
expect the first duplicate)



--
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-17 Thread R Smith


On 2016/11/17 10:48 AM, J Decker wrote:

https://en.wikipedia.org/wiki/Birthday_problem

/.../
one duplicate. In other words, only after generating 1 billion UUIDs every
second for the next 100 years, the probability of creating just one
duplicate would be about 50%.


All correct, but sounding misleading... Your phrase tends to sound like 
"Only after 1 billion UUIDs / sec for 100 years do I have a 50% chance 
of getting a single duplicate...".


The real story is: "After creating that amount of UUIDs, I have a 50% 
for every next UUID I make to be a duplicate..." - Which basically 
means, Uniqueness expectation is destroyed by that time, and in fact, 
long before that time.


What further is misleading, is that Loong before then, you will have 
a decent chance of getting duplicate UUIDs, 1%, or even 0.0001% chance 
happens billions of iterations sooner (which is already horribly bad for 
the standard programming project, meaning you'd have to actively test 
uniqueness long before this).


And even if the chance is 0.001% - there is no guarantee that any 
next UUID is unique - in fact, it is guaranteed that by a 1 in 100 
million chance, it won't be unique (so if you make 100 million UUIDs, 
not an uncommon amount, it will hit a duplicate).  It's like the Lotto 
analogy: There is an extremely small chance for you to win the lotto 
(IIRC 1 in 14 mil or so), yet every week or so, someone wins it...  
Don't confuse low probability with likelihood of an event - the 
population size dictates it.


Reminds me of this satire exchange between a probability theory 
professor and a student (I am doing this from memory, so might get it 
wrong, feel free to post the original if known):
Student: So, if I toss this coin ten times, It will land heads up five 
times?

Professor: Probably.
Student: But it will definitely get heads at least once right?
Professor: Probably.
Student:  But surely it won't land tails every time?
Professor: Probably not.
etc.

https://www.noisebridge.net/images/b/be/Conditional_risk.png

http://withfriendship.com/images/e/21237/probability-theory.gif

and...
http://xkcd.com/552/

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-17 Thread J Decker
Re Integer shorness... UUID does seem wasteful - to have to ccompare such a
long value.
But the string compare will often fail within the first 4 bytes, or less,
making many comparisons less 'work' than an integer of the same... since
sqlite stores everything as text if you are in 123,000,000.. 123,000,001
 takes a lot of comparison to find difference.

yes it's 'inconveneitnt' I can't remember the code for item X is 99 so
just select where id=99 - easy to type, instead I have to expect my
database browser to support copy and paste.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-17 Thread J Decker
https://en.wikipedia.org/wiki/Birthday_problem

In probability theory ,
the *birthday problem* or *birthday paradox
*[1]
 concerns the
probability  that, in a set
of {\displaystyle
n}[image: n] randomly  chosen people,
some pair of them will have the same birthday
. By the pigeonhole principle
, the probability
reaches 100% when the number of people reaches 367 (since there are only
366 possible birthdays, including February 29
). However, 99.9% probability is
reached with just 70 people, and 50% probability with 23 people. These
conclusions are based on the assumption that each day of the year (except
February 29) is equally probable for a birthday.

UUID https://en.wikipedia.org/wiki/Universally_unique_identifier
Out of a total of 128 bits, Type 4 UUIDs have 6 reserved bits (4 for the
version and 2 other reserved bits), so randomly generated UUIDs have 122
random bits. The chance of two such UUIDs having the same value can be
calculated using probability theory (birthday problem
). Using the approximation

To put these numbers into perspective, the annual risk of a given person
being hit by a meteorite is estimated to be one chance in 17 billion,[5]
 which
means the probability is about 0.006 (6 × 10−11), equivalent to the
odds of creating a* few tens of trillions *of UUIDs in a year and having
one duplicate. In other words, only after generating 1 billion UUIDs every
second for the next 100 years, the probability of creating just one
duplicate would be about 50%.

-
Generating a recordset on client side - you'll have that ID in one place,
if it needs to change (on insert fails) you generate a new one and insert
again - everything else is likely related to that first field in the first
place.
If it's a case of merging datasets - how many of a type of record are you
really generating? 4? 5?

if it's a generation of millions of opints - than probably there should be
a different unique ID associated that increases them to a full unique uuid
space each.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Simon Slavin

On 17 Nov 2016, at 3:13am, Keith Medcalf  wrote:

> You would only get guaranteed collision avoidance if you kept track of every 
> UUID issued and checked for duplicates at time of issue to avoid duplication 
> by keeping a list of all the "used" identifiers.  I do not think that keeping 
> track of all the UUIDs issued to avoid duplication is part of the spec.  They 
> really should be called SRID's (somewhat random identifiers).

The randomness is not part of the intended spec for UUIDs.  The emphasis was on 
the 'Unique', which is why the name had 'U' in, not 'R'.  The first form of 
UUIDs combined the IP address of the originating computer with the time in .1 
microsecond intervals.  The assumption is that no computer would be fast enough 
to need to name two entities in .1 microsecond.  Thus UUIDs would be inherently 
unique without needing any extra programming or needing to consult a database 
of already-used UUIDs.  This was considered very clever.

But this scheme was criticised for giving away too much information.  For 
instance, if you saw a UUID for a car part you could immediately deduce the IP 
address of am important computer belonging to the car company.  That could be a 
good target for hacking attempts.

So number of other options were thought up, one of which can easily be created 
from a URL, one which relates to a domain name, some of which were designed 
from the outset to have the properties of random numbers.  It was a 
random-looking scheme -- known as version 4 -- that I mentioned which set off 
this thread.  Because it looks random it gives away nothing about the 
originating device, and people like that these days.  But perhaps version 1 
UUIDs might suit the OP better.

So yes, randomness is important to the particular type of UUID we're discussing 
here.  But it's not essential to all UUIDs.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Keith Medcalf
> As someone mentioned in a different part of this thread, the issue is
> centralization.  If you have multiple decentralied machines generating
> UUIDs you *could* have collisions, but if you pick your UUID scheme well
> enough collisions will be extremely rare, on par with things like data
> loss from hardware failure.
> 
> Of course if you generate all your UUIDs on one machine (as you do when
> you call getUUID() on a single machine) you get guaranteed collision
> avoidance.

You would only get guaranteed collision avoidance if you kept track of every 
UUID issued and checked for duplicates at time of issue to avoid duplication by 
keeping a list of all the "used" identifiers.  I do not think that keeping 
track of all the UUIDs issued to avoid duplication is part of the spec.  They 
really should be called SRID's (somewhat random identifiers).






___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Nico Williams
On Wed, Nov 16, 2016 at 02:51:35PM -0600, Jeffrey Mattox wrote:
> I think this discussion is about apples and oranges.  UUID stands for
> universally UNIQUE identifier, so there won't be any collisions.  It
> looks random, but it never repeats.  [...]

No, DRH is right that this depends on how good your entropy source (and,
typically, PRNG fed that entropy) is.  Nothing about putting
"universally unique" in the name makes it so -- only the details of how
the sausage is made can take care of it.

Regardless, using UUIDs to make a distributed DB is not that great.  It
does work though.  Microsoft's Active Directory (AD), for example, uses
96-bit UUID-like values to form "domain SIDs", with user, group, and
other SIDs being formed by adding a 32-bit "relative ID" to the domain
SID.  This has worked rather well for MSFT, and it has allowed the
creation of "forests" of domains and forests of forests.  I do think AD
checks SID uniqueness within each forest, and IIRC there's a way to
handle SID collisions in forests of forests.

Uniqueness checks are not too expensive when they are feasible at all.

In the AD forest case they are feasible, while in the forest of forests
case they are not.

The alternative to randomly-generated IDs would be to have a global
registry (perhaps hierarchical), not unlike DNS, or ASN.1 OID arcs, but
there is a real cost to having to have a global registry.

So in a distributed system roughly like SPARQL, or AD, say, UUIDs will
do.  You might store them as blobs to avoid having to waste space, but,
whatever.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread David Raymond
https://xkcd.com/221/

"The generation of random numbers is too important to be left to chance." -I 
forget who


A UUID is basically a 16 byte integer that people agree to type out in a 
standard way. All the properties of UUID's that you've mentioned below depend 
on the random number generator being actually random.

"Guaranteed" means vastly different things to a mathematician, a programmer, 
and the extended warranty folks at Best Buy. From Wikipedia: "In other words, 
only after generating 1 billion UUIDs every second for the next 100 years, the 
probability of creating just one duplicate would be about 50%."

A 1 in 2^128 (1 in 340,282,366,920,938,463,463,374,607,431,768,211,456. 340 
undecillion?) chance of collision for any 2 honestly random UUID's means...

Mathematician: They're not unique at all. If anything then it's absolutely 
guaranteed that there  be collisions... eventually.
Programmer: "For all intents and purposes" unique. Good enough for government 
work.
Best Buy: Would you like to buy extra collision insurance on your new UUID? You 
never know what could happen after all. Or maybe upgrade to the gold plated 
UUID's we have over there?


-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jeffrey Mattox
Sent: Wednesday, November 16, 2016 3:52 PM
To: SQLite mailing list
Subject: Re: [sqlite] Changing ID's to UUID


I think this discussion is about apples and oranges.  UUID stands for 
universally UNIQUE identifier, so there won't be any collisions.  It looks 
random, but it never repeats.  Here's one: 
"cb058c3a-ac3d-11e6-80f5-76304dec7eb7".  I use a UUID [a non-random string 
guaranteed to be unique -- obtained via an iOS getUUID() function call] to 
match records in two databases on different machines.  They are records for 
individuals that occasionally have to be merged.  The UUID is acts as a 
guaranteed unique name for each individual.

Jeff

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Jeremy Brown
On Wed, Nov 16, 2016 at 02:51:35PM -0600, Jeffrey Mattox wrote:
> On Nov 16, 2016, at 8:46 AM, Richard Hipp  wrote:
> > 
> >> On 11/16/16, Keith Medcalf  wrote:
> >> What I do not
> >> understand is why one would use a UUID (randomly generated bunch of bytes)
> >> as a key in a database.  It is long, every use must be checked for
> >> collisions, and inherently far less efficient than the simple integer
> >> sequence it is replacing.
> > 
> > If you use good randomness to generate the UUID and if the UUID is
> > long enough, then you do not, in fact, need to check for collisions.
> > It doesn't take an excessively long UUID to make the probability of
> > collision become far less than the probability of a random cosmic ray
> > hit on your CPU causing it to give the wrong answer on a collision
> 
> I think this discussion is about apples and oranges.  UUID stands for 
> universally UNIQUE identifier, so there won't be any collisions.  It looks 
> random, but it never repeats.  Here's one: 
> "cb058c3a-ac3d-11e6-80f5-76304dec7eb7".  I use a UUID [a non-random string 
> guaranteed to be unique -- obtained via an iOS getUUID() function call] to 
> match records in two databases on different machines.  They are records for 
> individuals that occasionally have to be merged.  The UUID is acts as a 
> guaranteed unique name for each individual.

As someone mentioned in a different part of this thread, the issue is 
centralization.  If you have multiple decentralied machines generating UUIDs 
you *could* have collisions, but if you pick your UUID scheme well enough 
collisions will be extremely rare, on par with things like data loss from 
hardware failure.

Of course if you generate all your UUIDs on one machine (as you do when you 
call getUUID() on a single machine) you get guaranteed collision avoidance.

Jeremy



> 
> Jeff
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Jeffrey Mattox
On Nov 16, 2016, at 8:46 AM, Richard Hipp  wrote:
> 
>> On 11/16/16, Keith Medcalf  wrote:
>> What I do not
>> understand is why one would use a UUID (randomly generated bunch of bytes)
>> as a key in a database.  It is long, every use must be checked for
>> collisions, and inherently far less efficient than the simple integer
>> sequence it is replacing.
> 
> If you use good randomness to generate the UUID and if the UUID is
> long enough, then you do not, in fact, need to check for collisions.
> It doesn't take an excessively long UUID to make the probability of
> collision become far less than the probability of a random cosmic ray
> hit on your CPU causing it to give the wrong answer on a collision

I think this discussion is about apples and oranges.  UUID stands for 
universally UNIQUE identifier, so there won't be any collisions.  It looks 
random, but it never repeats.  Here's one: 
"cb058c3a-ac3d-11e6-80f5-76304dec7eb7".  I use a UUID [a non-random string 
guaranteed to be unique -- obtained via an iOS getUUID() function call] to 
match records in two databases on different machines.  They are records for 
individuals that occasionally have to be merged.  The UUID is acts as a 
guaranteed unique name for each individual.

Jeff

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread James K. Lowden
On Wed, 16 Nov 2016 08:59:03 -0600
Paul Egli  wrote:

> > Using the systemid sequence and the recordid sequence directly
> > however, has a 0% probability of collision, so any rational person
> > would use that directly and forgo entirely the introduction of
> > uncertainty and bugs using "UUID" type crappola will cause.
> >
> 
> As Dominique said, the issue here is decentralization... 

Decentralization, you say, but not no centralization.  If the data on 
those disconnected devices never came together, their keys would never
conflict.  

We've handled this before, more than once.  Ethernet cards have unique
addresses.  The Domain Name System supports a certain amount of
wackiness, but doesn't rely on randomness.  

Of course, those systems were designed by competent engineers  

> and i would add, particularly in a disconnected environment and/or
> one with no central authority. The method you describe does not
> handle device rollbacks or cloning.

I don't see how the method you describes solves anything.  If two
devices represent the same data -- or data belonging to the same
entity, or whatever -- they need a shared identifier to represent that
ownership.  If that identifier is chosen randomly or taken from
the devices, it will not join the information.  The owner of the
information will at some point have to assert their commonality: will
have to apply a known, common identifier to both sets of information.  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Paul Egli
On Wed, Nov 16, 2016 at 7:59 AM, Keith Medcalf  wrote:

> Using the systemid sequence and the recordid sequence directly however,
> has a 0% probability of collision, so any rational person would use that
> directly and forgo entirely the introduction of uncertainty and bugs using
> "UUID" type crappola will cause.
>

As Dominique said, the issue here is decentralization... and i would add,
particularly in a disconnected environment and/or one with no central
authority. The method you describe does not handle device rollbacks or
cloning.

For example, one of your systems is a mobile device with it's own unique
system id. Periodically, this device broadcasts its inserted data to other
devices. Also, the user backs up the device to their PC every now and then.
At some point the mobile device gets lost or damaged. When they restore
from backup, the last few sequential ids from that system id get reused and
collide. It is also possible to restore from backup to a different device,
even if the original is still alive and well, at which point you have two
different devices with the same system id broadcasting colliding keys.

Theoretically a new, unique system id should be generated any time a system
is backed up or copied anywhere. But when the backup/copying logic is
completely independent and unknowing of your systemid, you are left with
needing to detect if the physical device has changed. This may be
unreliable or impossible on some platforms. And i don't think it would be
possible to detect the case where a rollback happened on the same physical
device.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Richard Hipp
On 11/16/16, Keith Medcalf  wrote:
> What I do not
> understand is why one would use a UUID (randomly generated bunch of bytes)
> as a key in a database.  It is long, every use must be checked for
> collisions, and inherently far less efficient than the simple integer
> sequence it is replacing.

If you use good randomness to generate the UUID and if the UUID is
long enough, then you do not, in fact, need to check for collisions.
It doesn't take an excessively long UUID to make the probability of
collision become far less than the probability of a random cosmic ray
hit on your CPU causing it to give the wrong answer on a collision
test.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Keith Medcalf


> > I still cannot fathom why anyone would assign random numbers or (even
> more useless) long random blobs to use as psuedo-keys.  It just boggles
> the mind.
 
> I take it you’re not a cryptographer :) All modern ciphers do this. For
> example, an RSA key pair is simply a pair of large random numbers (both
> prime) that meet certain criteria. Or if you use a more modern cipher like
> Curve25519, the private key is quite literally just any 256 bits of random
> data. You generate a key-pair by reading 32 bytes from /dev/random into
> the private key, and then performing a transformation on that to get the
> public key.

I know and understand the uses of random numbers, encryption, and digests when 
used for the purpose for which they were invented.  What I do not understand is 
why one would use a UUID (randomly generated bunch of bytes) as a key in a 
database.  It is long, every use must be checked for collisions, and inherently 
far less efficient than the simple integer sequence it is replacing.

Of course, it is just a fad (like big huge wastes of whitespace and unreadable 
low-contrast ittybitty fonts in current web-page bootifications) adopted by 
those unable to comprehend the consequences of their decisions (and if they 
haven't had any yet, they are very lucky indeed).

> Obviously collisions are possible with long random numbers or digests, but
> secure systems are designed such that random collisions are vanishingly
> unlikely to occur for {insert large power of ten here} years, which makes
> the probability effectively zero.

No, you are incorrect.  A "good hash function" will evenly spread its 
collisions over its digest space. If you feed all possible 512-bit blocks into 
a 512-bit hash to obtain the output digests, when you feed in one more 513-bit 
input, you will get one collision.  If you feed in another 513-bit input you 
will get a different collision.  The "collision" digest will not be predictable 
(that is it will not "just always be the same as the first 512-bit blocks input 
digest with bit 438 flipped).  It is the property of being unable (very complex 
and taking a long time) to generate an input (chosen text) which results in a 
specific digest which is the useful property -- the fact that it can and must 
have a 100% probability of collision when the input space is larger than the 
output space is irrelevant.

THe problem is an inability to properly determine and assess risk.  When using 
a sequence the probability of a collision is 0.  When using a random generated 
number (passing a bunch of random data through a digest function) has a 
probability of collision of 100%.  Only if you have (for example) a sequence 
assigned "systemid" which is used as part of the input to the digest function, 
and use the generated recordid sequence number as input to the digest along 
with the random data does the probability of collision reduce from 100% to some 
small number greater than 0%.  Using the systemid sequence and the recordid 
sequence directly however, has a 0% probability of collision, so any rational 
person would use that directly and forgo entirely the introduction of 
uncertainty and bugs using "UUID" type crappola will cause.

Unfortunately there is a massive shortage of rational life on this planet.

 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Hick Gunter
On Tuesday, 15 November, 2016 15:30, R Smith wrote:
>
>> On 2016/11/15 10:38 PM, Jens Alfke wrote:
>
>> >> On Nov 15, 2016, at 11:35 AM, Quan Yong Zhai  wrote:
>
>> >> Create a custom function MD5 ,
>> > If you’re going to go to this trouble, at least use SHA256!
>> >
>> > MD5 is broken. These days no one should be using it for anything,
>> > except
>> when needed for compatibility with legacy formats/protocols.
>>
>> This strikes me as one of those statements that, upon hearing that a
>> rocket failed to land upright, cries out: "OMG, this is the end of
>> Space exploration as we know it!".
>>
>> MD5 is not broken. What would broken mean? It still hashes a byte
>> array just as good as it always did producing an easy key to check and
>> compare against. Sure, for ultimate safety systems you /should/ note
>> that the hashes /could/ possibly be brute-forced to, in the minimum
>> case, reproduce a duplicate answer for a different byte-list.
>
>All hash functions that take an input that is larger (longer) that the output 
>must by definition have collisions.  There is no if, and, or but about it.  If 
>you input every 513 bit sequence >into SHA-512 you WILL WITH ABSOLUTE 
>CERTAINTY obtain the same 512 bit hash result at least twice.
>
This is the pigeon(hole) principle, attributed to the German mathematician 
Peter Gustav Lejeune Dirichlet, who is noted to have used this argument in 1834.


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Jens Alfke

> On Nov 15, 2016, at 1:11 PM, Scott Robison  wrote:
> 
> Completely depends on your needs. If your needs are not cryptographic, then
> there is no problem.

But there’s little reason to use MD5 instead of SHA-1; they’re roughly the same 
speed, but SHA-1 is considerably more collision-resistant.
Or if you don’t need a long digest, just use Adler32, which is about four times 
as fast.


> I still cannot fathom why anyone would assign random numbers or (even more 
> useless) long random blobs to use as psuedo-keys.  It just boggles the mind.

I take it you’re not a cryptographer :) All modern ciphers do this. For 
example, an RSA key pair is simply a pair of large random numbers (both prime) 
that meet certain criteria. Or if you use a more modern cipher like Curve25519, 
the private key is quite literally just any 256 bits of random data. You 
generate a key-pair by reading 32 bytes from /dev/random into the private key, 
and then performing a transformation on that to get the public key.

Obviously collisions are possible with long random numbers or digests, but 
secure systems are designed such that random collisions are vanishingly 
unlikely to occur for {insert large power of ten here} years, which makes the 
probability effectively zero.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Dominique Devienne
> I still cannot fathom why anyone would assign random numbers or (even more
> useless) long random blobs to use as psuedo-keys.
>

Because it is decentralized. You can assign random uuids as immutable
surrogate keys to your entities without going to the db for a sequence
based integer sk. And that scales to concurrent clients. My $0.02
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Keith Medcalf


On Tuesday, 15 November, 2016 15:30, R Smith wrote:

> On 2016/11/15 10:38 PM, Jens Alfke wrote:

> >> On Nov 15, 2016, at 11:35 AM, Quan Yong Zhai  wrote:

> >> Create a custom function MD5 ,
> > If you’re going to go to this trouble, at least use SHA256!
> >
> > MD5 is broken. These days no one should be using it for anything, except
> when needed for compatibility with legacy formats/protocols.
> 
> This strikes me as one of those statements that, upon hearing that a
> rocket failed to land upright, cries out: "OMG, this is the end of Space
> exploration as we know it!".
> 
> MD5 is not broken. What would broken mean? It still hashes a byte array
> just as good as it always did producing an easy key to check and compare
> against. Sure, for ultimate safety systems you /should/ note that the
> hashes /could/ possibly be brute-forced to, in the minimum case,
> reproduce a duplicate answer for a different byte-list.

All hash functions that take an input that is larger (longer) that the output 
must by definition have collisions.  There is no if, and, or but about it.  If 
you input every 513 bit sequence into SHA-512 you WILL WITH ABSOLUTE CERTAINTY 
obtain the same 512 bit hash result at least twice.

> So, if you provide public safety documents / programs / systems that may
> provide opportunity for nefarious intent individuals to abuse, please
> step up your hashing, but if you need unique keys for your program that
> keeps kids-party attendance lists, MD5 is as good a method (and as fast)
> as any - in the same way that it is still OK to use INTEGER PRIMARY KEY
> rather than a complete SHA256 primary ID column for your shopping list.
> 
> That said, I don't disagree with the notion that, IF the OP is going to
> go through the trouble of making custom functions to accomplish this
> task, might as well go full regalia.

I still cannot fathom why anyone would assign random numbers or (even more 
useless) long random blobs to use as psuedo-keys.  It just boggles the mind.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread R Smith



On 2016/11/15 10:38 PM, Jens Alfke wrote:

On Nov 15, 2016, at 11:35 AM, Quan Yong Zhai  wrote:

Create a custom function MD5 ,

If you’re going to go to this trouble, at least use SHA256!

MD5 is broken. These days no one should be using it for anything, except when 
needed for compatibility with legacy formats/protocols.


This strikes me as one of those statements that, upon hearing that a 
rocket failed to land upright, cries out: "OMG, this is the end of Space 
exploration as we know it!".


MD5 is not broken. What would broken mean? It still hashes a byte array 
just as good as it always did producing an easy key to check and compare 
against. Sure, for ultimate safety systems you /should/ note that the 
hashes /could/ possibly be brute-forced to, in the minimum case, 
reproduce a duplicate answer for a different byte-list.


So, if you provide public safety documents / programs / systems that may 
provide opportunity for nefarious intent individuals to abuse, please 
step up your hashing, but if you need unique keys for your program that 
keeps kids-party attendance lists, MD5 is as good a method (and as fast) 
as any - in the same way that it is still OK to use INTEGER PRIMARY KEY 
rather than a complete SHA256 primary ID column for your shopping list.


That said, I don't disagree with the notion that, IF the OP is going to 
go through the trouble of making custom functions to accomplish this 
task, might as well go full regalia.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Scott Robison
On Tue, Nov 15, 2016 at 1:38 PM, Jens Alfke  wrote:

>
> > On Nov 15, 2016, at 11:35 AM, Quan Yong Zhai  wrote:
> >
> > Create a custom function MD5 ,
>
> If you’re going to go to this trouble, at least use SHA256!
>
> MD5 is broken. These days no one should be using it for anything, except
> when needed for compatibility with legacy formats/protocols.
>

Completely depends on your needs. If your needs are not cryptographic, then
there is no problem. Just like there is no problem using a pseudo random
number generator for non cryptographic purposes.

-- 
Scott Robison
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Jens Alfke

> On Nov 15, 2016, at 11:35 AM, Quan Yong Zhai  wrote:
> 
> Create a custom function MD5 ,

If you’re going to go to this trouble, at least use SHA256!

MD5 is broken. These days no one should be using it for anything, except when 
needed for compatibility with legacy formats/protocols.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Quan Yong Zhai
Create a custom function MD5 ,

Custum_uuid()  -> MD5(table_name  || Numeric_ID || "salt string" )

Update table tab1 set id= md5('tab1' || '$' || ID || '$' || 'My custom string')


Update table tab1 set ref_id= md5('tab2' || '$' || ref_id || '$' || 'My custom 
string')

发自我的 Windows Phone

发件人: Cecil Westerhof<mailto:cldwester...@gmail.com>
发送时间: ‎2016/‎11/‎15 16:03
收件人: SQLite mailing list<mailto:sqlite-users@mailinglists.sqlite.org>
主题: [sqlite] Changing ID's to UUID

I have several tables where a numeric ID is used. I want to change
those to UUID's. Is there a smart way to do this, or need I to do this
one by one?

--
Cecil Westerhof
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Richard Hipp
On 11/15/16, Jens Alfke  wrote:
>
> SQLite documentation does not describe which
> random number generator is used; it just calls it “pseudo-random”.

The SQLite PRNG uses RC4 and is seeded from /dev/random (on unix - the
seeding on windows is not as good).  So randomblob() does provide
decent randomness.  I would be hesitant to base a cryptosystem on it,
but it should be plenty adequate for UUIDs.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Jens Alfke

> On Nov 15, 2016, at 3:02 AM, Simon Slavin  wrote:
> 
> SQLite has a randomblob function which can be used to select part of the 
> UUID, but you need to pick a UUID scheme suitable for your purposes to know 
> how much of it can be random.

In some use cases it’s important for security to guarantee that a UUID actually 
is unique and can’t be guessed*. In such cases I would be wary of using 
randomblob(), since the SQLite documentation does not describe which random 
number generator is used; it just calls it “pseudo-random”. If you generate the 
UUID yourself you can get the random bits from /dev/random or some other 
high-quality entropy source.

—Jens

* this isn’t just paranoia. There have been real-world vulnerabilities that 
stemmed from insufficiently random IDs.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Simon Slavin

On 15 Nov 2016, at 12:59pm, Cecil Westerhof  wrote:

>> SQLite has a randomblob function which can be used to select part of the 
>> UUID, but you need to pick a UUID scheme suitable for your purposes to know 
>> how much of it can be random.
> 
> That is something to look into.

Lots of people use UUIDs of this form:



Pick your own value for 'y'.  And even pick your own value for some of the 
substrings, though some considerations of randomness and probability start to 
figure in if you do that.

Frankly I'd generate my random hex digits in my preferred programming language 
rather than have SQLite do it, but it does seem possible to have SQLite do it 
using randomblob().

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Cecil Westerhof
2016-11-15 12:02 GMT+01:00 Simon Slavin :
>
> On 15 Nov 2016, at 8:03am, Cecil Westerhof  wrote:
>
>> I have several tables where a numeric ID is used. I want to change
>> those to UUID's. Is there a smart way to do this, or need I to do this
>> one by one?
>
> I think you'll have to do it in software.

I thought so, but it never hurts to check.


> SQLite has a randomblob function which can be used to select part of the 
> UUID, but you need to pick a UUID scheme suitable for your purposes to know 
> how much of it can be random.

That is something to look into.

-- 
Cecil Westerhof
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing ID's to UUID

2016-11-15 Thread Simon Slavin

On 15 Nov 2016, at 8:03am, Cecil Westerhof  wrote:

> I have several tables where a numeric ID is used. I want to change
> those to UUID's. Is there a smart way to do this, or need I to do this
> one by one?

I think you'll have to do it in software.

SQLite has a randomblob function which can be used to select part of the UUID, 
but you need to pick a UUID scheme suitable for your purposes to know how much 
of it can be random.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Changing ID's to UUID

2016-11-15 Thread Cecil Westerhof
I have several tables where a numeric ID is used. I want to change
those to UUID's. Is there a smart way to do this, or need I to do this
one by one?

-- 
Cecil Westerhof
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users