tutububug commented on code in PR #207: URL: https://github.com/apache/kvrocks-website/pull/207#discussion_r1547513290
########## community/data-structure-on-rocksdb.md: ########## @@ -314,3 +315,28 @@ where the `payload` is a string encoded in the corresponding `format`: | CBOR | 1 | Also, if we decide to add a more IO-friendly format to avoid reading all payload to the memory before searching an element via JSONPath or seperate a relatively large JSON to multiple key-values, we can take advantage of the `format` field. + +## Hyperloglog + +Redis hyperloglog can be thought of as a static array with a length of 16384. The array elements are called registers, which are used to store the maximum count of consecutive 0s. This register array is the input parameter for the hyperloglog algorithm. +In Kvrocks, the hyperloglog data structure is stored in following two parts: + +#### hyperloglog metadata + +```text + +----------+------------+-----------+-----------+ +key => | flags | expire | version | size | + | (1byte) | (Ebyte) | (8byte) | (Sbyte) | + +----------+------------+-----------+-----------+ +``` +- `size` records the number of registers, which is a constant, even no element is added. + +#### hyperloglog sub keys-values Review Comment: I had changed my previous implementation of the storage format, currently, the format is the same as the bitmap's, and each HLL subkey represents a segment with 1024 registers. Also, I tried to use bitfield, but finally, I found that it requires byte alignment because of 'memcpy' (it should only be able to operate bits within a byte). The implementation of Redis only uses 6 bits, not aligned, to save more space, so it seems not suitable. However, another problem arises. When the register is set to 6-bit, unit test runs fail on a few OS releases in CI, but there is no problem when it is set to 8-bit, maybe it has something to do with byte alignment, but I don’t know the reason yet. Therefore, the current implementation still maintains 8 bits without using bitfield. If I understand something wrong, correct me, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
