nickva commented on pull request #651:
URL: 
https://github.com/apache/couchdb-documentation/pull/651#issuecomment-881764357


   @kocolosk Good point, the locality trick would be useful internally to say 
process the changes feed for the indexing but wouldn't help with write 
hotspots. The design the _changes feed external API is pretty neat and I think 
it may be worth going that way eventually but perhaps with an auto-sharding set 
up so that users don't have to think about Q at all.
   
   Found a description of how FDB backup system avoids hot write shards 
https://forums.foundationdb.org/t/keyspace-partitions-performance/168/2. 
Apparently it's based on writing to `(hash(version/1e6), version)` key ranges, 
to have a balance between being able to query ranges but also avoid writing 
more than 1 second of data (by default versions advance at a rate of about 1e6 
per second) to one particular shard at a time on average. Not sure yet if 
that's an idea we can borrow directly but perhaps there is something there...
   
   Regarding changes feed being a bottleneck for indexing, we did a quick and 
dirty test by reading 1M and 10M changes on a busy cluster (3 storage nodes) 
and we were able to get about 58-64k rows/sec with just an empty accumulator 
which counts rows.
   
   ```
   {ok, Db} = fabric2_db:open(<<"perf-test-user/put_insert_1626378013">>, []).
   Fun = fun(_Change, Acc) -> {ok, Acc + 1} end.
   
   ([email protected])6> timer:tc(fun() -> fabric2_db:fold_changes(Db, 0, Fun, 
0, [{limit, 1000000}]) end).
   {16550135,{ok,1000000}}
   
   ([email protected])12> timer:tc(fun() -> fabric2_db:fold_changes(Db, 0, Fun, 
0, [{limit, 10000000}]) end).
   {156290604,{ok,10000000}}
   ....
   ```
   
   For indexing at least, it seems that's not too bad. We'd want to probably 
find a way to parallelize doc fetches, and most of all concurrent index updates.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to