I wanted to share a failed approach just to save some time for anyone who is thinking about the issue.
I thought we could change the key format of a by_seq index at the expense of an extra LAST_LESS_THAN call on every doc write. Currently the keys in by_seq index are `("changes", Sequence)`, where Sequence is the sequence of the last transaction that modified the document. We could change the key to be in the form of `("changes", Sequence, IncNumber)`. We would maintain the IncNumber as follows: on each call to fabric2_db:write_doc we would also retrieve erlfdb_key:last_less_than(`("changes", Sequence)`). We would parse the value of returned key to extract last IncNumber. Then we would create the by_seq entry under `("changes", Sequence, IncNumber+1)`. Currently we use by_seq index only in changes feed. We use fold over range in there which means we can just ignore the value of IncNumber. The count_changes_since functionality would be implemented as follows: count_changes_since(Db, SinceSeq) -> LastSequence = fabric2_util:seq_max_vs(), LastKey = erlfdb_key:last_less_than("changes", LastSequence), SinceKey = erlfdb_key:first_greater_than("changes", SinceSequence), {?DB_CHANGES, _, LastInc} = erlfdb_tuple:unpack(LastKey, DbPrefix), {?DB_CHANGES, _, SinceInc} = erlfdb_tuple:unpack(SinceKey, DbPrefix), LastInc - SinceInc. The downsides of this approach: 1. extra range read for each doc write (critical path) 2. increase in number of read conflicts (we would fail every time when we have updates of different docs in parallel) --- I think this is a show stopper Best regards, iilyak On 2020/07/17 16:21:50, Tony Sun <tony.sun...@gmail.com> wrote: > Hi all, > > I recently started implementing _active_tasks for our fdb development > branch. At first, I thought it would be trivial, but technical limitations > have led me to modify our response as an interim solution. I'd like to get > more feedback on this solution and start a discussion on a more > accurate/correct solution going forward. > > *Problem:* > Most active tasks rely upon a "Total" value to determine progress. This > relies on `count_changes_since/2` : > https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db_engine.erl#L634-L652 > > I cannot think of an efficient way of implementing this on top of fdb > without it being inefficient. Paul has probably thought about this more > deeply during the initial layer design phase, but I may have missed some of > those discussions. > > Since Couch 2.0, our update_seq string does has a snapshot of the number of > changes prepended. This also does not exist in the fdb-layer branch. > > Ultimately, there is no way to calculate the total number of changes for > given a update_seq. > > *Proposed Solution:* > We simply send out the versionstamp of db sequence we are trying to reach, > and the current versionstamp. So the responses look something like: > > [ > { > "node": "node1@127.0.0.1", > "pid": "<0.622.0>", > "changes_done": 199, > "current_version_stamp": "8131141649532-198", > "database": "testdb", > "db_version_stamp": "8131141649532-999", > "design_document": "_design/example", > "started_on": 1594703583, > "type": "indexer", > "updated_on": 1594703586 > } > ] > > [ > { > "node": "node1@127.0.0.1", > "pid": "<0.1030.0>", > "changes_done": 1000, > "current_version_stamp": "8131194932916-999", > "database": "testdb", > "db_version_stamp": "8131194932916-999", > "design_document": "_design/example", > "started_on": 1594703636, > "type": "indexer", > "updated_on": 1594703665 > } > ] > > The user can utilize the changes_done (this is just a running counter for > that task process), current_version_stamp, and db_version_stamp to gauge if > the task is making progress. > > My concern is that this a breaking change for users that rely on the > "total_changes" and "progress" fields. > > I've opened a PR for this and have gotten good feedback on some > implementation details but would love to get consensus on the response > format: https://github.com/apache/couchdb/pull/3003 > > *Moving Forward:* > I've read a few foundationdb forum posts and topic of "Can I get the > changes to the DB, given a versionstamp?" has been discussed a few times. > I'm not sure it will be done on the fdb end anytime soon. I briefly > considered adding another b-tree in memory, but that seems overkill just > for this Total feature. > > Thanks, > > Tony >