I should point out that the sequential algorithm in couchdb is
carefully constructed so that generated ids won't clash, even in a
distributed system. You might have assumed that the sequential ids
were 1, 2, 3, 4, ... and so on, but they are not.

The sequential ids are the same length as the random ids (16 bytes).
The first 13 bytes stay the same for around 8000 generated ids and is
then rerandomized. The ids with the same prefix have suffixes in
strictly increasing numeric order. This characteristic (that a new id
is numerically close to the previous id) is what helps with insertion
speed and general b-tree performance.

Before changing the default I think it would be worth getting numbers
from a suitably fair benchmark, I would still advocate random as the
default until that is done.

B.

On Mon, Jan 11, 2010 at 12:51 AM, Chris Anderson <[email protected]> wrote:
> On Sun, Jan 10, 2010 at 4:24 PM, Roger Binns <[email protected]> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Chris Anderson wrote:
>>> I'm not feeling super-strong about this. However, making the default
>>> sequential seems like it will preempt a lot of the problems people
>>> tend to show up asking about.
>>
>
> If we think that speed and size are more important than randomness, we
> should continue to refine uuid generators.
>
> Roger, if you can make a short sequential that'd be neat.
>
>> There are several issues conflated together:
>>
>> - - When doing inserts, sorted ids are faster
>>
>> - - The resulting size of the db file is the size of the docs plus a multiple
>> of the _id size (and probably an exponential of the size)
>>
>> - - Sequential ids give small _id
>>
>> - - Random ids give large _id
>>
>> - - Sequentials will clash between different dbs (consider replication,
>> multiple instances etc).  They'll also lead people to rely on this
>> functionality as though it was like a SQL primary key
>>
>> - - Random ids won't clash and better illustrate how CouchDB really works
>>
>>> I think the info-leakage argument is overblown
>>
>> It does make URLs easy to guess like many ecommerce sites that didn't
>> validate when showing you an invoice - you added one to the numeric id in
>> the URL and got to see someone elses.
>>
>> I would far prefer the size of the db file and the size of the _id link
>> being addressed.  Because the _id size can cause the db file to get so big,
>> I/O etc is a lot slower mainly because there is just so much more file to
>> deal with!  (In my original posting I had a db go from 21GB to 4GB by
>> reducings ids from 16 bytes to 4 bytes.)
>>
>> Roger
>>
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iEYEARECAAYFAktKb8AACgkQmOOfHg372QRb0ACfRWu1TUOs3twwmOGgAUOwhLfx
>> FJkAoKgnkWnPayPtPqMfk3/AxOj2xaMx
>> =V7Zq
>> -----END PGP SIGNATURE-----
>>
>>
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Reply via email to