I should add that the purpose of the pipe is to avoid situations where
iteration takes arbitrarily long due to needing to allocate enough memory
in userland and having the kernel/userspace race with increases in memory
requirements as it iterates (or having things hang from too many/large
pools to list).

On 29 May 2015 at 11:06, Richard Yao <[email protected]> wrote:

> Dear Matt,
>
> I am trying to solve is the absence of a sane way to get the state in a
> manner similar to `zfs list` from libzfs_core. The current API is
> non-atomic and I was concerned about memory utilization on large/many
> pools, so I wrote a lzc_list that operated using a pipe while holding
> locks. When porting that to Illumos, I realized that these locks could be
> held arbitrarily long by userland, so I came up with a second approach that
> did holds. Unfortunately, this lead the code to use linear memory with the
> number of datasets and hurt atomicity, so I came up with the idea of
> pinning the the last txg and using that as output.
>
> There seems to be no way to list datasets in a way that is simultaneously
> atomic, non-blocking, memory efficient and consistent (where we get the
> latest state). My latest thoughts are to implement lzc_list with an
> "atomic" toggle that does the first way that I implemented things as a
> privileged operation that by default requires root in the global zone on
> Illumos and pushes the non-atomic code we have now into the kernel when it
> is not.
>
> What do you think?
>
> Yours truly,
> Richard Yao
>
>
> On 27 May 2015 at 17:08, Matthew Ahrens <[email protected]> wrote:
>
>>
>>
>> On Wed, May 27, 2015 at 7:23 PM, Richard Yao <[email protected]>
>> wrote:
>>
>>> Dear Everyone,
>>>
>>> As some people know, I have been working on libzfs_core extensions and I
>>> currently have a prototype lzc_list command that is a large subset of the
>>> functionality of `zfs list`.
>>>
>>> After discussing it with others, I suspect that implementing an in-core
>>> pool metadata snapshot facility for lzc_list would be the most natural way
>>> of implementing lzc_list. The in-core pool snapshot would atomically pin
>>> the metadata state of a pool on disk for `zfs list` without holding locks
>>> while it is traversed (my first implementation) or requiring that we pin
>>> memory via holds on dsl_dataset_t objects until the operation is finished
>>> (my second implementation). While the in-core pool metadata snapshot is in
>>> effect, the blocks containing the pool metadata would not be marked free in
>>> the in-core free space_map, but it would be marked free in the on-disk
>>> space map when it would normally be marked as such. That has the downside
>>> that disk space would not be freed right away, but we make no guarantees of
>>> immediately freeing disk space anyway, so I suspect that is okay.
>>>
>>> Would this be something entirely new or is there already a way to
>>> snapshot the pool's metadata state in-core either of which I am unaware or
>>> in a branch somewhere?
>>>
>>>
>> Let's say for the sake of argument that we don't overwrite anything on
>> disk that you care about.  What else do you need to do?  I'm imagining that
>> you will have a separate idea of the metadata state (what datasets exist,
>> their properties and interrelations, etc), which is out of date from what's
>> really on disk.  How do you maintain that?  It seems nontrivial.
>>
>> Maybe you could start by describing the problem that you are solving?  It
>> sounds like you want an atomic view of the pool metadata (that's used by
>> "zfs list")?
>>
>> --matt
>>
>
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to