[PR] BTree engine term cache [couchdb]

via GitHub Mon, 18 Aug 2025 16:13:32 -0700


nickva opened a new pull request, #5625:
URL: https://github.com/apache/couchdb/pull/5625


   Cache the top b-tree nodes and header terms. Items are inserted into the 
cache on first use and then they have to be updated when accessed. If they are 
not updated frequently enough they are evicted.
   
   In order to get or update any kv leaf nodes in the b-trees, we always have 
to read the top few kp nodes closer to the root. Even with parallel preads and 
data being in the page cache the cost of system calls, marshaling the term, 
going through the erlang IO system can add up.
   
   The cache is limited in size. The size defaults to 64Mb and is configurable 
by the user. If the cache is full, no more entries will be added until more 
space frees up.
   
   Since multiple processes are accessing the data concurrently, ets tables are 
sharded by the number of schedulers, not unlike how we shard our couch_server 
processes. Each ets table has an associated cleaner process which evicts unused 
entries. Cleaners traverse table entries and "decay" the usage counters 
exponentially, by repeatedly shifting the value to the right with the `bsr` op. 
Entries with a usage counter equal 0 are then removed.
   
   In order to only insert the top nodes in the cache the couch_btree module's 
lookup and streaming functions were updated with a `depth` parameter. If that 
way we avoid thrashing node which are less likely to be reused through the 
cache: if the depth > 2 or the node is a kv_node it skips the cache altogether.
   
   To get an idea on tuning the cache size for various workload there are two 
new metrics to count hits and misses:
   
   ```
   % http $DB/_node/_local/_stats/couchdb/bt_engine_cache
   
   {
       "hits": {
           "desc": "number of bt_engine cache hits",
           "type": "counter",
           "value": 233310
       },
       "misses": {
           "desc": "number of bt_engine cache misses",
           "type": "counter",
           "value": 7386
       }
   }
   ```
   
   Those are the metrics after  running `fabric_bench:go(#{q=>8, 
doc_size=>small, docs=>100000})`.
   
   A quick and dirty comparison with main:
   
   ```
   > fabric_bench:go(#{q=>8, doc_size=>small, docs=>100000}).
    *** Parameters
    * batch_size       : 1000
    * doc_size         : small
    * docs             : 100000
    * individual_docs  : 1000
    * n                : default
    * q                : 8
   
    *** Environment
    * Nodes        : 1
    * Bench ver.   : 1
    * N            : 1
    * Q            : 8
    * OS           : unix/darwin
    * Couch ver.   : 3.5.0-d16fc1f
    * Couch git sha: d16fc1f
    * VM details   : Erlang/OTP 26 [erts-14.2.5.11] [source] [64-bit] 
[smp:12:12] [ds:12:12:16] [async-threads:1]
   
    *** Inserting 100000 docs
    * Add 100000 docs, ok:100/accepted:0     (Hz):   22000
    * Get random doc 100000X                 (Hz):    3800
    * All docs                               (Hz):  130000
    * All docs w/ include_docs               (Hz):   48000
    * Changes                                (Hz):   16000
    * Single doc updates 1000X               (Hz):     530
    * Time to run all benchmarks            (sec):      40
   ```
   
   With the btree cache:
   
   ```
   fabric_bench:go(#{q=>8, doc_size=>small, docs=>100000}).
    *** Parameters
    * batch_size       : 1000
    * doc_size         : small
    * docs             : 100000
    * individual_docs  : 1000
    * n                : default
    * q                : 8
   
    *** Environment
    * Nodes        : 1
    * Bench ver.   : 1
    * N            : 1
    * Q            : 8
    * OS           : unix/darwin
    * Couch ver.   : 3.5.0-d16fc1f
    * Couch git sha: d16fc1f
    * VM details   : Erlang/OTP 26 [erts-14.2.5.11] [source] [64-bit] 
[smp:12:12] [ds:12:12:16] [async-threads:1]
   
    *** Inserting 100000 docs
    * Add 100000 docs, ok:100/accepted:0     (Hz):   24000
    * Get random doc 100000X                 (Hz):    4400
    * All docs                               (Hz):  140000
    * All docs w/ include_docs               (Hz):   49000
    * Changes                                (Hz):   29000
    * Single doc updates 1000X               (Hz):     680
    * Time to run all benchmarks            (sec):      33
   ```
   
   The idea to use a depth parameter and an ets table came from Paul J. Davis 
(@davisp).
   
   <!-- Thank you for your contribution!
   
        Please file this form by replacing the Markdown comments
        with your text. If a section needs no action - remove it.
   
        Also remember, that CouchDB uses the Review-Then-Commit (RTC) model
        of code collaboration. Positive feedback is represented +1 from 
committers
        and negative is a -1. The -1 also means veto, and needs to be addressed
        to proceed. Once there are no objections, the PR can be merged by a
        CouchDB committer.
   
        See: http://couchdb.apache.org/bylaws.html#decisions for more info. -->
   
   ## Overview
   
   <!-- Please give a short brief for the pull request,
        what problem it solves or how it makes things better. -->
   
   ## Testing recommendations
   
   <!-- Describe how we can test your changes.
        Does it provide any behaviour that the end users
        could notice? -->
   
   ## Related Issues or Pull Requests
   
   <!-- If your changes affect multiple components in different
        repositories please put links to those issues or pull requests here.  
-->
   
   ## Checklist
   
   - [ ] Code is written and works correctly
   - [ ] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in 
`rel/overlay/etc/default.ini`
   - [ ] Documentation changes were made in the `src/docs` folder
   - [ ] Documentation changes were backported (separated PR) to affected 
branches
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] BTree engine term cache [couchdb]

Reply via email to