Re: reftable [v5]: new ref storage format

Howard Chu Wed, 09 Aug 2017 04:24:27 -0700

Shawn Pearce wrote:

On Sun, Aug 6, 2017 at 4:37 PM, Ben Alex <[email protected]> wrote:

> Just on the LmdbJava specific pieces:
>
> On Mon, Aug 7, 2017 at 8:56 AM, Shawn Pearce <[email protected]> wrote:

I don't know if we need a larger key size. $DAY_JOB limits ref names
to ~200 bytes in a hook. I think GitHub does similar. But I'm worried
about the general masses who might be using our software and expect
ref names thus far to be as long as PATH_MAX on their system. Most
systems run PATH_MAX around 1024.

The key size limit in LMDB can be safely raised to around 2KB or so withoutany issues. There's also work underway in LMDB 1.0 to raise the limit to 2GB,but in general it would be silly to use such large keys.

Mostly at $DAY_JOB its because we can't virtualize the filesystem
calls the C library is doing.

In git-core, I'm worried about the caveats related to locking. Git
tries to work nicely on NFS,


That may be a problem in current LMDB 0.9, but needs further clarification.

and it seems LMDB wouldn't. Git also runs
fine on a read-only filesystem, and LMDB gets a little weird about
that.

Not sure what you're talking about. LMDB works perfectly fine on read-onlyfilesystems, it just enforces that it is used in read-only mode.

Finally, Git doesn't have nearly the risks LMDB has about a
crashed reader or writer locking out future operations until the locks
have been resolved. This is especially true with shared user
repositories, where another user might setup and own the semaphore.


All locks disappear when the last process using the DB environment exits.

If only a single process is using the DB environment, there's no issue. Ifmultiple processes are sharing the DB environment concurrently, the write lockcleans up automatically when the writer terminates; stale reader locks wouldrequire a call to mdb_reader_check() to clean them up.

The primary issue with using LMDB over NFS is with performance. All reads areperformed thru accesses of mapped memory, and in general, NFS implementationsdon't cache mmap'd pages. I believe this is a consequence of the fact thatthey also can't guarantee cache coherence, so the only way for an NFS clientto see a write from another NFS client is by always refetching pages wheneverthey're accessed.

This is also why LMDB doesn't provide user-level VFS hooks - it's generallyimpractical to emulate mmap from application level. You could always write aFUSE driver if that's really what you need to do, but again, the performanceof such a solution is pretty horrible.

LMDB's read lock management also wouldn't perform well over NFS; it also usesan mmap'd file. On a local filesystem LMDB read locks are zero cost since theyjust atomically update a word in the mmap. Over NFS, each update to the mmapwould also require an msync() to propagate the change back to the server. Thiswould seriously limit the speed with which read transactions may be opened andclosed. (Ordinarily opening and closing a read txn can be done with zerosystem calls.)


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: reftable [v5]: new ref storage format

Reply via email to