On 5/15/20 9:35 PM, Louis Bertrand wrote:
I chose PostgreSQL because a) it's already installed on the eventual target server and b) I'm familiar with it. However, where is the trade-off between SQLite and PostgreSQL? Tens, hundreds, thousands? Number of users, transactions? Etc. In other words, I might have saved myself some trouble by simply accepting the default for a small site.


One advantage of supporting multiple databases is that we don't have to choose on behalf of someone else ;-)

Also, I guess the load of a "user" might vary with a factor 100 depending on how the system is used. I guess a system with 1000 quite active users easily can have less than 1 write operation per second and a manageable amount of read operations and work fine with a number of worker processes on a single server. In that case I guess sqlite would work fine.

I think the biggest difference is in the operations. If a DBA is running the system, he might want a "real" database. There might be more confidence in how PostgreSQL handle scaling and system failures and allow more powerful integrations with other systems. It is very hard to lose critical data in a DVCS hosting system. The important data are in the repos directly in the file system - not in the database. And worst case, all the clients will still have the data they pushed ... and other clients might already have pulled them.

We could perhaps put some generic advice in the docs:

Start out with a single server with sqlite and reliable local storage (with regular backup of storage and database). If you need failover, make it cold. If users experience delays while the server load is low (especially if the workers are busy serving repo data), add more worker processes. If the server load is high, consider adding more CPU or memory if feasible. If network load is the bottleneck, try solving that. Also consider offloading buildfarm load to a simpler or separate system or make it "smarter" to decrease the load.

Then, if necessary or for other reasons desirable, scale up to use something like PostgreSQL (possibly on a separate HA system/cluster ... and potentially higher latency), shared storage (network or SAN), and multiple (physical) worker servers.

While different existing setups might have met different limitations, does any existing users have any comments or advice to add to this?

/Mads

_______________________________________________
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general

Reply via email to