[jira] [Updated] (COUCHDB-1373) Time-ordered document ids including the database identity

Nick North (Updated) (JIRA) Sat, 31 Dec 2011 07:28:57 -0800

     [ 
https://issues.apache.org/jira/browse/COUCHDB-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Nick North updated COUCHDB-1373:
--------------------------------

    Description: 
This suggestion is for an enhancement to the document id generation algorithms 
in CouchDb. I am new to CouchDb, and this question addresses an old issue 
(https://issues.apache.org/jira/browse/COUCHDB-465) so please forgive me if I 
am retreading old ground.

My application has a number of mutually replicating CouchDb instances and I 
would like document ids to be monotonically-increasing per-instance, and 
globally unique, and for the instance where the document was created to be 
determinable from the id. (To be more accurate - I don't need to know anything 
about the instance itself; just whether any two documents originated from the 
same instance.) The utc_random algorithm is not far from meeting these 
requirements, as ids are monotonic and almost certainly globally unique. 
However, the instance cannot be determined from the id, and there is a tiny 
chance of an id clash between two instances. Both of these issues could be 
solved if the random part of the id could be replaced with a suffix that is 
fixed in the ini file for each instance.

To address this I have a modified version of couch_uuids.erl introducing a new 
utc_machine_id algorithm which reads a machine_id string from the ini file and 
then generates ids using an internal utc_suffix method that just appends the 
string to the usual utc 14-byte string. utc_random then also uses the 
utc_suffix method, but its suffix is the usual random byte string.

However, it is obviously a nuisance to have to maintain a non-standard 
distribution, so I wondered if there is enough call for this sort of thing to 
make it a part of the standard distribution? If there is, I'd be very happy to 
make my code available for discussion/modification/inclusion. If there are good 
reasons why this is a bad idea, then I'd also be very interested to hear them 
so that I can rethink my ideas. (It happens that the privacy and guessability 
concerns raised in the original discussion do not apply in my case.) If this 
question has been beaten to death, then I'm sorry for bothering the group, and 
would be grateful if someone could point me to the discussions so that I can 
understand the issues.

  was:
This suggestion is for an enhancement to the document id generation algorithms 
in CouchDb. I am new to CouchDb, and this question addresses an old issue 
(https://issues.apache.org/jira/browse/COUCHDB-465) so please forgive me if I 
am retreading old ground.

My application has a number of mutually replicating CouchDb instances and I 
would like document ids to be monotonically-increasing per-instance, and 
globally unique, and for the instance where the document was created to be 
determinable from the id. (To be more accurate - I don't need to know anything 
about the instance itself; just whether any two documents originated from the 
same instance.) The utc_random algorithm is not far from meeting these 
requirements, as ids are monotonic and almost certainly globally unique. 
However, the instance cannot be determined from the id, and there is a tiny 
chance of an id clash between two instances. Both of these issues could be 
solved if the random part of the id could be replaced with a suffix that is 
fixed in the ini file for each instance.

To addresses this I have a modified version of couch_uuids.erl introducing a 
new utc_machine_id algorithm which reads a machine_id string from the ini file 
and then generates ids using an internal utc_suffix method that just appends 
the string to the usual utc 14-byte string. utc_random then also uses the 
utc_suffix method, but its suffix is the usual random byte string.

However, it is obviously a nuisance to have to maintain a non-standard 
distribution, so I wondered if there is enough call for this sort of thing to 
make it a part of the standard distribution? If there is, I'd be very happy to 
make my code available for discussion/modification/inclusion. If there are good 
reasons why this is a bad idea, then I'd also be very interested to hear them 
so that I can rethink my ideas. (It happens that the privacy and guessability 
concerns raised in the original discussion do not apply in my case.) If this 
question has been beaten to death, then I'm sorry for bothering the group, and 
would be grateful if someone could point me to the discussions so that I can 
understand the issues.

    
> Time-ordered document ids including the database identity
> ----------------------------------------------------------
>
>                 Key: COUCHDB-1373
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1373
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Nick North
>            Priority: Minor
>              Labels: uuid
>
> This suggestion is for an enhancement to the document id generation 
> algorithms in CouchDb. I am new to CouchDb, and this question addresses an 
> old issue (https://issues.apache.org/jira/browse/COUCHDB-465) so please 
> forgive me if I am retreading old ground.
> My application has a number of mutually replicating CouchDb instances and I 
> would like document ids to be monotonically-increasing per-instance, and 
> globally unique, and for the instance where the document was created to be 
> determinable from the id. (To be more accurate - I don't need to know 
> anything about the instance itself; just whether any two documents originated 
> from the same instance.) The utc_random algorithm is not far from meeting 
> these requirements, as ids are monotonic and almost certainly globally 
> unique. However, the instance cannot be determined from the id, and there is 
> a tiny chance of an id clash between two instances. Both of these issues 
> could be solved if the random part of the id could be replaced with a suffix 
> that is fixed in the ini file for each instance.
> To address this I have a modified version of couch_uuids.erl introducing a 
> new utc_machine_id algorithm which reads a machine_id string from the ini 
> file and then generates ids using an internal utc_suffix method that just 
> appends the string to the usual utc 14-byte string. utc_random then also uses 
> the utc_suffix method, but its suffix is the usual random byte string.
> However, it is obviously a nuisance to have to maintain a non-standard 
> distribution, so I wondered if there is enough call for this sort of thing to 
> make it a part of the standard distribution? If there is, I'd be very happy 
> to make my code available for discussion/modification/inclusion. If there are 
> good reasons why this is a bad idea, then I'd also be very interested to hear 
> them so that I can rethink my ideas. (It happens that the privacy and 
> guessability concerns raised in the original discussion do not apply in my 
> case.) If this question has been beaten to death, then I'm sorry for 
> bothering the group, and would be grateful if someone could point me to the 
> discussions so that I can understand the issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COUCHDB-1373) Time-order​ed document ids including the database identity

Reply via email to

[jira] [Updated] (COUCHDB-1373) Time-ordered document ids including the database identity