Re: apr_dbm and concurrency

2023-09-25 Thread Branko Čibej

On 25.09.2023 16:58, Joe Orton wrote:

It is unspecified whether the apr_dbm.h interface is safe to use for
multiple processes/threads having r/w access to a single database.
Results appear to be:

- sdbm, gdbm are safe
- bdb, ndbm are not safe

(Berkeley DB obviously can be used in a way which is safe for multiple
r/w users but it appears to require using one of the more complicated
modes of operation via a DB_ENV, and changing to that would not be
backwards compatible with the current db format. Corrections welcome,
not a database expert)


IIRC, Berkeley DB multi-process concurrency is managed through an 
on-disk "register" file external to the actual key/value store. The 
key/value store format is not affected by the presence of this file. The 
DB_REGISTER mechanism was introduced in BDB 4.4 (now long defunct) and 
can be used for both concurrency control and automatic database 
recovery. The client-side code for this can be lifted from Subversion.


(I was involved in designing this mechanism for BDB and implementing its 
use in Subversion, but that was ages ago -- back in 2005. There may be 
better ways do do this in newer versions of Berkeley DB).


TL;DR: all upstream supported versions of BDB should have this mechanism 
available and APR can detect if it's being used without changing the 
API, and even "upgrade" existing databases with the register file on the 
fly without affecting the actual database.


-- Brane

apr_dbm and concurrency

2023-09-25 Thread Joe Orton
It is unspecified whether the apr_dbm.h interface is safe to use for 
multiple processes/threads having r/w access to a single database. 
Results appear to be:

- sdbm, gdbm are safe
- bdb, ndbm are not safe

(Berkeley DB obviously can be used in a way which is safe for multiple 
r/w users but it appears to require using one of the more complicated 
modes of operation via a DB_ENV, and changing to that would not be 
backwards compatible with the current db format. Corrections welcome, 
not a database expert)

This seems pretty bad, httpd's use of this interface depends on the DBM 
API being safe for concurrent use, but I'm not sure there is any good 
way forward.

Options I can see:

1. Implement APR-specific locking inside apr_dbm for unsafe db types, 
e.g. by creating a lockfile ".lock" and use apr_file_lock

2. Drop concurrency-unsafe db methods... but, APR 2.x only?

3. No code change. Describe the state of concurrency-safety in the API 
for each db type.  httpd and other users would be forced to select a DB 
type appropriate to the use case.

Any other suggestions, and any preferences among the above? I'm not sure 
if 3 isn't the least bad choice unfortunately.

Regards, Joe