Dear Matt,

I am trying to solve is the absence of a sane way to get the state in a
manner similar to `zfs list` from libzfs_core. The current API is
non-atomic and I was concerned about memory utilization on large/many
pools, so I wrote a lzc_list that operated using a pipe while holding
locks. When porting that to Illumos, I realized that these locks could be
held arbitrarily long by userland, so I came up with a second approach that
did holds. Unfortunately, this lead the code to use linear memory with the
number of datasets and hurt atomicity, so I came up with the idea of
pinning the the last txg and using that as output.

There seems to be no way to list datasets in a way that is simultaneously
atomic, non-blocking, memory efficient and consistent (where we get the
latest state). My latest thoughts are to implement lzc_list with an
"atomic" toggle that does the first way that I implemented things as a
privileged operation that by default requires root in the global zone on
Illumos and pushes the non-atomic code we have now into the kernel when it
is not.

What do you think?

Yours truly,
Richard Yao

On 27 May 2015 at 17:08, Matthew Ahrens <[email protected]> wrote:

>
>
> On Wed, May 27, 2015 at 7:23 PM, Richard Yao <[email protected]>
> wrote:
>
>> Dear Everyone,
>>
>> As some people know, I have been working on libzfs_core extensions and I
>> currently have a prototype lzc_list command that is a large subset of the
>> functionality of `zfs list`.
>>
>> After discussing it with others, I suspect that implementing an in-core
>> pool metadata snapshot facility for lzc_list would be the most natural way
>> of implementing lzc_list. The in-core pool snapshot would atomically pin
>> the metadata state of a pool on disk for `zfs list` without holding locks
>> while it is traversed (my first implementation) or requiring that we pin
>> memory via holds on dsl_dataset_t objects until the operation is finished
>> (my second implementation). While the in-core pool metadata snapshot is in
>> effect, the blocks containing the pool metadata would not be marked free in
>> the in-core free space_map, but it would be marked free in the on-disk
>> space map when it would normally be marked as such. That has the downside
>> that disk space would not be freed right away, but we make no guarantees of
>> immediately freeing disk space anyway, so I suspect that is okay.
>>
>> Would this be something entirely new or is there already a way to
>> snapshot the pool's metadata state in-core either of which I am unaware or
>> in a branch somewhere?
>>
>>
> Let's say for the sake of argument that we don't overwrite anything on
> disk that you care about.  What else do you need to do?  I'm imagining that
> you will have a separate idea of the metadata state (what datasets exist,
> their properties and interrelations, etc), which is out of date from what's
> really on disk.  How do you maintain that?  It seems nontrivial.
>
> Maybe you could start by describing the problem that you are solving?  It
> sounds like you want an atomic view of the pool metadata (that's used by
> "zfs list")?
>
> --matt
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to