I wanted to share a failed approach just to save some time for anyone who is 
thinking about the issue. 

I thought we could change the key format of a by_seq index at the expense of 
an extra LAST_LESS_THAN call on every doc write. 

Currently the keys in by_seq index are `("changes", Sequence)`, 
where Sequence is the sequence of the last transaction that modified the 
document. 

We could change the key to be in the form of `("changes", Sequence, 
IncNumber)`. 
We would maintain the IncNumber as follows:
on each call to fabric2_db:write_doc we would also retrieve 
erlfdb_key:last_less_than(`("changes", Sequence)`). 
We would parse the value of returned key to extract last IncNumber. 
Then we would create the by_seq entry under `("changes", Sequence, 
IncNumber+1)`.

Currently we use by_seq index only in changes feed. We use fold over range in 
there which means we can just ignore the value of IncNumber.

The count_changes_since functionality would be implemented as follows:

count_changes_since(Db, SinceSeq) ->
    LastSequence = fabric2_util:seq_max_vs(),
    LastKey = erlfdb_key:last_less_than("changes", LastSequence),
    SinceKey = erlfdb_key:first_greater_than("changes", SinceSequence),
    {?DB_CHANGES, _, LastInc} = erlfdb_tuple:unpack(LastKey, DbPrefix), 
    {?DB_CHANGES, _, SinceInc} = erlfdb_tuple:unpack(SinceKey, DbPrefix),
     LastInc - SinceInc.

The downsides of this approach:

1. extra range read for each doc write (critical path)
2. increase in number of read conflicts (we would fail every time when we have 
updates of different docs in parallel) --- I think this is a show stopper   

Best regards,
iilyak


On 2020/07/17 16:21:50, Tony Sun <tony.sun...@gmail.com> wrote: 
> Hi all,
> 
>    I recently started implementing _active_tasks for our fdb development
> branch. At first, I thought it would be trivial, but technical limitations
> have led me to modify our response as an interim solution. I'd like to get
> more feedback on this solution and start a discussion on a more
> accurate/correct solution going forward.
> 
> *Problem:*
> Most active tasks rely upon a "Total" value to determine progress. This
> relies on `count_changes_since/2` :
> https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db_engine.erl#L634-L652
> 
> I cannot think of an efficient way of implementing this on top of fdb
> without it being inefficient. Paul has probably thought about this more
> deeply during the initial layer design phase, but I may have missed some of
> those discussions.
> 
> Since Couch 2.0, our update_seq string does has a snapshot of the number of
> changes prepended. This also does not exist in the fdb-layer branch.
> 
> Ultimately, there is no way to calculate the total number of changes for
> given a update_seq.
> 
> *Proposed Solution:*
> We simply send out the versionstamp of db sequence we are trying to reach,
> and the current versionstamp. So the responses look something like:
> 
> [
>     {
>         "node": "node1@127.0.0.1",
>         "pid": "<0.622.0>",
>         "changes_done": 199,
>         "current_version_stamp": "8131141649532-198",
>         "database": "testdb",
>         "db_version_stamp": "8131141649532-999",
>         "design_document": "_design/example",
>         "started_on": 1594703583,
>         "type": "indexer",
>         "updated_on": 1594703586
>     }
> ]
> 
> [
>     {
>         "node": "node1@127.0.0.1",
>         "pid": "<0.1030.0>",
>         "changes_done": 1000,
>         "current_version_stamp": "8131194932916-999",
>         "database": "testdb",
>         "db_version_stamp": "8131194932916-999",
>         "design_document": "_design/example",
>         "started_on": 1594703636,
>         "type": "indexer",
>         "updated_on": 1594703665
>     }
> ]
> 
> The user can utilize the changes_done (this is just a running counter for
> that task process), current_version_stamp, and db_version_stamp to gauge if
> the task is making progress.
> 
> My concern is that this a breaking change for users that rely on the
> "total_changes" and "progress" fields.
> 
> I've opened a PR for this and have gotten good feedback on some
> implementation details but would love to get consensus on the response
> format: https://github.com/apache/couchdb/pull/3003
> 
> *Moving Forward:*
> I've read a few foundationdb forum posts and topic of "Can I get the
> changes to the DB, given a versionstamp?" has been discussed a few times.
> I'm not sure it will be done on the fdb end anytime soon. I briefly
> considered adding another b-tree in memory, but that seems overkill just
> for this Total feature.
> 
> Thanks,
> 
> Tony
> 

Reply via email to