On Tue, Dec 3, 2013 at 5:14 PM, Kohsuke Kawaguchi <[email protected]> wrote:

> Thanks for the comments. I really appreciate that.
>
> Now, I subsequently proposed a change to make spa_asize_inflation a
> tunable in ZFS on Linux (https://github.com/zfsonlinux/zfs/pull/1913),
> and a comment was made that a separate effort
> https://github.com/zfsonlinux/zfs/pull/1696 (which is porting of
> https://www.illumos.org/issues/4045) would make this irrelevant.
>
> Now, if I understand correctly, IllumOS #4045 is already merged, and at
> the same time you just gave me the source code pointer where
> spa_asize_inflation is still a tunable in IllumOS. Does that mean this
> tunable is stale in IllumOS already? Or did #4045 not fully eliminate the
> need for spa_get_asize?
>

 I don't understand the questions.  The fix for illumos #4045 introduced
the spa_asize_inflation tunable.  AFAIK there is no intention of
eliminating that tunable or spa_get_asize.  Your pull request #1913 is a
subset of illumos #4045.  You should probably just pull the entire #4045
into linux, which is what pull request #1696 is.

--matt


>
> 2013/12/3 Matthew Ahrens <[email protected]>
>
>>
>>> So now I'm trying to see if there's any way to improve this estimate.
>>> But I'm new to ZFS codebase, and so I'm looking for some help.
>>>
>>> - I'm not using raidz, so I should be able to knock off x4 right off the
>>> bat. Shoudn't there be a way to tell if the pool is using raidz by looking
>>> at spa->spa_root_vdev and calculate multiplication factors based on vdev
>>> tree? (And since vdev tree shape won't change that much, hopefully
>>> pre-calculate this value and store it in vdev_t)
>>>
>>
>> Yes.  You will need to look at all top-level vdevs (i.e.
>> spa_root_vdev->vdev_children[]) and see if they are using RAID-Z.  You
>> would probably want to cache this in the spa_t.  You will need to
>> re-evaluate when a new top-level vdev is added.
>>
>
> I see, so root vdev itself is a dummy place holder and its children is
> what I normally see in "zpool status" and the likes.
>
>
>>
>>
>>>
>>> - My 100MB write is write system calls to a file, and my understanding
>>> is that for those ZFS wouldn't replicate blocks. So I'm wasting x3, too.
>>> What if we pass in more contextual parameters to determine proper block
>>> replication factor? If so, where can I learn the block replication policy
>>> in ZFS?
>>>
>>
>> This will be trickier, because some blocks (metadata) will be dittoed,
>> and others will not.  The dmu_tx_hold_* code does not currently make this
>> distinction.  The ditto policy is implemented in dmu_write_policy(), which
>> is called way after you would need this information.
>>
>
> OK, I'll skip that one then. Too hard for me.
>
>
>>
>>
>>>
>>> - On the last x2 factor, it appears that "ddt_sync" is dedup related? If
>>> so, again is there any way to tell that dedup is not on and skip this? I
>>> suppose this is not a property of spa but of dsl_dataset (?). Is something
>>> like that feasible?
>>>
>>
>> That's right.  You could either see if dedup is used anywhere in the
>> pool, or you could see if the property is set on this particular dataset.
>>  The dataset is readily available to all callers of spa_get_asize.  The
>> logic for determining if dedup is used is also in dmu_write_policy().  You
>> would need to replicate this logic in callers of spa_get_asize(), but you
>> can use a simplified version, probably just "os_dedup_checksum !=
>> ZIO_CHECKSUM_OFF".
>>
>
> Thanks. I'll look at the code to see if this is something approachable for
> me.
>
>
>>
>>
>>>
>>> Any insights/thoughts into this would be highly appreciated.
>>>
>>> --
>>> Kohsuke Kawaguchi
>>>
>>> _______________________________________________
>>> developer mailing list
>>> [email protected]
>>> http://lists.open-zfs.org/mailman/listinfo/developer
>>>
>>>
>>
>
>
> --
> Kohsuke Kawaguchi
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to