[
https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742911#action_12742911
]
Robert Newson commented on COUCHDB-465:
---------------------------------------
Thanks!
Guessability is a concern, which means this might need to be switchable.
Perhaps couch_seq_generator becomes couch_id_generator and an ini file chooses
between the two strategies, defaulting to the safest, but worst-case, new_uuid
behavior. To get good keys for b+tree insertion necessarily makes them more
guessable as they'd have to be close to existing keys by design.
I do owe some quantitative benchmarking to support the assertions in the
description. I did a 10k insertion test with a small document, {content:
"hello"}, and average insertion rate per document was 2ms with random and 1ms
with the patch. This was more to prove that I'd changed *something* rather than
a measure of the actual improvement. I would expect to see improved insertion
rates across a lot of scenarios, less difference between uncompacted and
compacted size (barring document updates and deletes) as less of the b+tree is
rewritten, and a smaller post-compaction size vs random. The exact extent of
these improvements should be established by a decent benchmark.
> Produce sequential, but unique, document id's
> ---------------------------------------------
>
> Key: COUCHDB-465
> URL: https://issues.apache.org/jira/browse/COUCHDB-465
> Project: CouchDB
> Issue Type: Improvement
> Reporter: Robert Newson
> Attachments: sequence_id.patch
>
>
> Currently, if the client does not specify an id (POST'ing a single document
> or using _bulk_docs) a random 16 byte value is created. This kind of key is
> particularly brutal on b+tree updates and the append-only nature of couchdb
> files.
> Attached is a patch to change this to a two-part identifier. The first part
> is a random 12 byte value and the remainder is a counter. The random prefix
> is rerandomized when the counter reaches its maximum. The rollover in the
> patch is at 16 million but can obviously be changed. The upshot is that the
> b+tree is updated in a better fashion, which should lead to performance
> benefits.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.