Hi,

Yes, I think this would be a good addition for 3.0. I think we didn't add it 
before because of concerns of accidental misuse (attempting to replicate with 
it but forgetting a range, etc)? Whatever the reasons, I think exposing the 
per-partition _changes feed exactly as you've described will be useful. We 
should state explicitly in the accompanying docs that the replicator does not 
use this endpoint (though, of course, it might be enhanced to do so in a future 
release).

From 4.0 onward, there's a discussion elsewhere on whether any of the 
_partition endpoints continue to exist (leaning towards keeping them just to 
avoid unnecessary upgrade pain?), so a note in that thread would be good too. 
It does seem odd to enhance an endpoint in 3.0 to then remove it entirely in 
4.0. The reasons for removing _partition are compelling however, as the 
motivating (internal) reason for introducing _partition is gone.

B.

> On 12 May 2020, at 22:59, Adam Kocoloski <kocol...@apache.org> wrote:
> 
> Hi all,
> 
> When we introduced partitioned databases in 3.0 we declined to add a 
> partition-specific _changes endpoint, because we didn’t have a prebuilt index 
> that could support it. It sounds like the lack of that endpoint is a bit of a 
> drag. I wanted to start this thread to consider adding it.
> 
> Note: this isn’t a fully-formed proposal coming from my team with a plan to 
> staff the development of it. Just a discussion :)
> 
> In the simplest case, a _changes feed could be implemented by scanning the 
> by_seq index of the shard that hosts the named partition. We already get some 
> efficiencies here: we don’t need to touch any of the other shards of the 
> database, and we have enough information in the by_seq btree to filter out 
> documents from other partitions without actually retrieving them from disk, 
> so we can push the filter down quite nicely without a lot of extra 
> processing. It’s just a very cheap binary prefix pattern match on the docid.
> 
> Most consumers of the _changes feed work incrementally, and we can support 
> that here as well. It’s not like we need to do a full table scan on every 
> incremental request.
> 
> If the shard is hosting so many partitions that this filter is becoming a 
> bottleneck, resharding (also new in 3.0) is probably a good option. 
> Partitioned databases are particularly amenable to increasing the shard 
> count. Global indexes on the database become more expensive to query, but 
> those ought to be a smaller percentage of queries in this data model.
> 
> Finally, if the overhead of filtering out non-matching partitions is just too 
> high, we could support the use of user-created indexes, e.g. by having a user 
> create a Mango index on _local_seq. If such an index exists, our “query 
> planner” uses it for the partitioned _changes feed. If not, resort to the 
> scan on the shard’s by_seq index as above.
> 
> I’d like to do some basic benchmarking, but I have a feeling the by_seq work 
> quite well in the majority of cases, and the user-defined index is a good 
> "escape valve” if we need it. WDYT?
> 
> Adam

Reply via email to