[DISCUSS] _changes feed on database partitions

Adam Kocoloski Tue, 12 May 2020 14:59:23 -0700

Hi all,

When we introduced partitioned databases in 3.0 we declined to add a 
partition-specific _changes endpoint, because we didn’t have a prebuilt index 
that could support it. It sounds like the lack of that endpoint is a bit of a 
drag. I wanted to start this thread to consider adding it.


Note: this isn’t a fully-formed proposal coming from my team with a plan to 
staff the development of it. Just a discussion :)

In the simplest case, a _changes feed could be implemented by scanning the 
by_seq index of the shard that hosts the named partition. We already get some 
efficiencies here: we don’t need to touch any of the other shards of the 
database, and we have enough information in the by_seq btree to filter out 
documents from other partitions without actually retrieving them from disk, so 
we can push the filter down quite nicely without a lot of extra processing. 
It’s just a very cheap binary prefix pattern match on the docid.

Most consumers of the _changes feed work incrementally, and we can support that 
here as well. It’s not like we need to do a full table scan on every 
incremental request.

If the shard is hosting so many partitions that this filter is becoming a 
bottleneck, resharding (also new in 3.0) is probably a good option. Partitioned 
databases are particularly amenable to increasing the shard count. Global 
indexes on the database become more expensive to query, but those ought to be a 
smaller percentage of queries in this data model.

Finally, if the overhead of filtering out non-matching partitions is just too 
high, we could support the use of user-created indexes, e.g. by having a user 
create a Mango index on _local_seq. If such an index exists, our “query 
planner” uses it for the partitioned _changes feed. If not, resort to the scan 
on the shard’s by_seq index as above.

I’d like to do some basic benchmarking, but I have a feeling the by_seq work 
quite well in the majority of cases, and the user-defined index is a good 
"escape valve” if we need it. WDYT?

Adam

[DISCUSS] _changes feed on database partitions

Reply via email to