I wanted to share a failed approach just to save some time for anyone who is
thinking about the issue.
I thought we could change the key format of a by_seq index at the expense of
an extra LAST_LESS_THAN call on every doc write.
Currently the keys in by_seq index are `("changes", Sequence)`,
where Sequence is the sequence of the last transaction that modified the
document.
We could change the key to be in the form of `("changes", Sequence,
IncNumber)`.
We would maintain the IncNumber as follows:
on each call to fabric2_db:write_doc we would also retrieve
erlfdb_key:last_less_than(`("changes", Sequence)`).
We would parse the value of returned key to extract last IncNumber.
Then we would create the by_seq entry under `("changes", Sequence,
IncNumber+1)`.
Currently we use by_seq index only in changes feed. We use fold over range in
there which means we can just ignore the value of IncNumber.
The count_changes_since functionality would be implemented as follows:
count_changes_since(Db, SinceSeq) ->
LastSequence = fabric2_util:seq_max_vs(),
LastKey = erlfdb_key:last_less_than("changes", LastSequence),
SinceKey = erlfdb_key:first_greater_than("changes", SinceSequence),
{?DB_CHANGES, _, LastInc} = erlfdb_tuple:unpack(LastKey, DbPrefix),
{?DB_CHANGES, _, SinceInc} = erlfdb_tuple:unpack(SinceKey, DbPrefix),
LastInc - SinceInc.
The downsides of this approach:
1. extra range read for each doc write (critical path)
2. increase in number of read conflicts (we would fail every time when we have
updates of different docs in parallel) --- I think this is a show stopper
Best regards,
iilyak
On 2020/07/17 16:21:50, Tony Sun <[email protected]> wrote:
> Hi all,
>
> I recently started implementing _active_tasks for our fdb development
> branch. At first, I thought it would be trivial, but technical limitations
> have led me to modify our response as an interim solution. I'd like to get
> more feedback on this solution and start a discussion on a more
> accurate/correct solution going forward.
>
> *Problem:*
> Most active tasks rely upon a "Total" value to determine progress. This
> relies on `count_changes_since/2` :
> https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db_engine.erl#L634-L652
>
> I cannot think of an efficient way of implementing this on top of fdb
> without it being inefficient. Paul has probably thought about this more
> deeply during the initial layer design phase, but I may have missed some of
> those discussions.
>
> Since Couch 2.0, our update_seq string does has a snapshot of the number of
> changes prepended. This also does not exist in the fdb-layer branch.
>
> Ultimately, there is no way to calculate the total number of changes for
> given a update_seq.
>
> *Proposed Solution:*
> We simply send out the versionstamp of db sequence we are trying to reach,
> and the current versionstamp. So the responses look something like:
>
> [
> {
> "node": "[email protected]",
> "pid": "<0.622.0>",
> "changes_done": 199,
> "current_version_stamp": "8131141649532-198",
> "database": "testdb",
> "db_version_stamp": "8131141649532-999",
> "design_document": "_design/example",
> "started_on": 1594703583,
> "type": "indexer",
> "updated_on": 1594703586
> }
> ]
>
> [
> {
> "node": "[email protected]",
> "pid": "<0.1030.0>",
> "changes_done": 1000,
> "current_version_stamp": "8131194932916-999",
> "database": "testdb",
> "db_version_stamp": "8131194932916-999",
> "design_document": "_design/example",
> "started_on": 1594703636,
> "type": "indexer",
> "updated_on": 1594703665
> }
> ]
>
> The user can utilize the changes_done (this is just a running counter for
> that task process), current_version_stamp, and db_version_stamp to gauge if
> the task is making progress.
>
> My concern is that this a breaking change for users that rely on the
> "total_changes" and "progress" fields.
>
> I've opened a PR for this and have gotten good feedback on some
> implementation details but would love to get consensus on the response
> format: https://github.com/apache/couchdb/pull/3003
>
> *Moving Forward:*
> I've read a few foundationdb forum posts and topic of "Can I get the
> changes to the DB, given a versionstamp?" has been discussed a few times.
> I'm not sure it will be done on the fdb end anytime soon. I briefly
> considered adding another b-tree in memory, but that seems overkill just
> for this Total feature.
>
> Thanks,
>
> Tony
>