nickva commented on PR #5625: URL: https://github.com/apache/couchdb/pull/5625#issuecomment-3202798994
I pretty-fied a quick btree stats reporter I had since we were wondering what the tree looked like from above (this is q=8, 100k docs) just one shard copy: ```json "sizes": { "active": 5812423, "external": 4040930, "file": 7188696, "id_tree": { "1": { "kp_node": { "cnt": 1, "max": 5, "min": 5 } }, "2": { "kp_node": { "cnt": 5, "max": 21, "min": 15 } }, "3": { "kp_node": { "cnt": 91, "max": 23, "min": 11 } }, "4": { "kv_node": { "cnt": 1450, "max": 15, "min": 1 } } }, "seq_tree": { "1": { "kp_node": { "cnt": 1, "max": 2, "min": 2 } }, "2": { "kp_node": { "cnt": 2, "max": 26, "min": 2 } }, "3": { "kp_node": { "cnt": 28, "max": 47, "min": 6 } }, "4": { "kv_node": { "cnt": 975, "max": 15, "min": 2 } } } } ``` The key is depth, then node type, then `cnt` is the number of nodes at that level, `min` is the smallest node size (number of kvs/kps), `max` is the largest size. It's not as shallow as we'd expect due to how complete_root works, and chunk_size is probably not the best any longer (doesn't count for compression). I was going to look into maybe having a different chunk size, or different per node type (kps get more), but that's for a different PR. So caching the top 2 nodes makes sense, there are not that many and they bring the biggest benefit. Top 3 could also be an option but maybe start smaller at first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org