nickva commented on PR #5698: URL: https://github.com/apache/couchdb/pull/5698#issuecomment-3398599747
> But the doc id was hex-encoded in the original, so not the '128-bit binary value'. I am using the raw binary value (128-bit, 16 bytes) in the new purge optimization PR directly, that's used for the UUID not the DocID. The raw 128 bit (16 bytes) representation is the shortest so we cut the ID size in half, which helps with storing lots of them, b-tree sizes etc. We don't emit them in the API or accept them as input. However json doesn't pass raw binary through so we can't return those in `_uuid` results. We'd have to base64, base32 or base16 encode them. We do that with the other UUID types -- `random` uuid is a 16 byte binary, which we hex encode. But we could have also base32 encode it to keep it even shorter, for example. Some UUID types have standard string representation formats with hex + dashes between some parts. That's best in general but the RFC recommends using binaries (or in our case encoded binaries) when feasible. I can see users wanting both, so we could have it configurable, but maybe make uuid v7 hex without dashes the default, for compatibility. Users may check/assert the length of these of these IDs somewhere. We do something of that sort with revisions - if they look like 32 byte string we turn them into binaries: https://github.com/apache/couchdb/blob/16ced957924ff3cac20c3c9c3dd91d9a1d0ce7fc/src/couch/src/couch_doc.erl#L191-L193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
