nickva commented on PR #5698:
URL: https://github.com/apache/couchdb/pull/5698#issuecomment-3398599747

   > But the doc id was hex-encoded in the original, so not the '128-bit binary 
value'.
   
   I am using the raw binary value (128-bit, 16 bytes) in the new purge 
optimization PR directly, that's used for the UUID not the DocID. The raw 128 
bit (16 bytes) representation is the shortest so we cut the ID size in half, 
which helps with storing lots of them, b-tree sizes etc.  We don't emit them in 
the API or accept them as input. However json doesn't pass raw binary through 
so we  can't return those in `_uuid` results.  We'd have to base64, base32 or 
base16 encode them. We do that with the other UUID types -- `random` uuid is a 
16 byte binary, which we hex encode. But we could have also base32 encode it to 
keep it even shorter, for example.
   
   Some UUID types have standard string representation formats with hex + 
dashes between some parts. That's best in general but the RFC recommends using 
binaries (or in our case encoded binaries) when feasible.
   
   I can see users wanting both, so we could have it configurable, but maybe 
make uuid v7 hex without dashes the default, for compatibility. Users may 
check/assert the length of these of these IDs somewhere. We do something of 
that sort with revisions - if they look like 32 byte string we turn them into 
binaries: 
https://github.com/apache/couchdb/blob/16ced957924ff3cac20c3c9c3dd91d9a1d0ce7fc/src/couch/src/couch_doc.erl#L191-L193


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to