Thanks for the replies, some more questions follow.

Your answers below seem to contradict each other somewhat.
Is it true that:
1) VDEV cache before b70 used to contain a full copy
   of prefetched disk contents,

2) VDEV cache since b70 analyzes the prefetched sectors
   and only keeps metadata blocks,

3) VDEV cache since b148 is disabled by default?

So in fact currently we only have file-level "intelligent"
prefetching?

On my older systems I fired "kstat -p zfs:0:vdev_cache_stats"
and saw hit/miss ratios ranging from 30% to 70%. On the oi_148a
box I do indeed see all-zeros.

While I do understand the implications of VDEV-caching lots
of disks on systems with inadequate RAM, I tend to find this
feature useful on smaller systems - like home-NASes. It is
essentially free in terms of mechanical seeks, as well as
in RAM (what is 60-100Mb for a small box at home?) and any
nonzero hit ratio that speeds up the system seems justifiable ;)

I've tried playing with the options on my oi_148a LiveUSB
repair boot, and got varying results:

VDEV is indeed disabled by default, but can be enabled.
My system is scrubbing now, so it's got a few cache hits
(about 10%) right away.

root@openindiana:~# echo zfs_vdev_cache_size/W0t10000000 | mdb -kw
zfs_vdev_cache_size:            0               =       0x989680

root@openindiana:~# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:class    misc
zfs:0:vdev_cache_stats:crtime   65.042318652
zfs:0:vdev_cache_stats:delegations      72
zfs:0:vdev_cache_stats:hits     11
zfs:0:vdev_cache_stats:misses   158
zfs:0:vdev_cache_stats:snaptime 114232.782154249

However, trying to increase the prefetch size hung my system
almost immediately (in a couple of seconds). I'm away from
it now, so I'll ask for a photo of the console screen :)

root@openindiana:~# echo zfs_vdev_cache_max/W0t16384 | mdb -kw
zfs_vdev_cache_max:             0x4000          =       0x4000
root@openindiana:~# echo zfs_vdev_cache_bshift/W0t20 | mdb -kw
zfs_vdev_cache_bshift:          0x10            =       0x14


So there are deeper questions:
1) As of Illumos bug #175 (as well as OpenSolaris b148 and
   if known - Solaris 11), is the vdev prefetch feature
   *removed* from codebase ("no" as of oi_148a, what about
   others?), or disabled by default (i.e. limit is preset
   to 0, tune it yourself)?

2) If it is only disabled, are there solid plans to remove
   it, or can we vote to keep it for those interested? :)

3) If the feature is present and gets enabled, how would
   VDEV prefetch play along with file prefetch, again? ;)

4) Is there some tuneable (after b70) to enable prefetching
   and keeping of user-data as well (not only metadata)?
   Perhaps only so that I could test it with my use-patterns
   to make sure that caching generic sectors is useless for
   me, and I really should revert to caching only metadata?

5) Would it make sense to increase zfs_vdev_cache_bshift?
   For example, when I tried to set it to 20 and prefetch
   a whole 1MB of data, why would that cause the system
   to die? Can it increase cache hit ratios (if works)?

6) Does the VDEV cache keep ZFS blocks or disk sectors?
   For example, on my 4k disks the blocks are 4k, even
   though there are a few hundred bytes worth of data in
   metadata blocks and 3+KB of slack space.

7) Modern HDDs often have 32-64Mb DRAM cache onboard.
   Is there any reason to match VDEV cache size with that
   in any way (1:1, 2:1, etc)?

Thanks again,
//Jim Klimov


2012-01-09 6:06, Richard Elling wrote:
On Jan 8, 2012, at 5:10 PM, Jim Klimov wrote:
2012-01-09 4:14, Richard Elling пишет:
On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:

I wonder if it is possible (currently or in the future as an RFE)
to tell ZFS to automatically read-ahead some files and cache them
in RAM and/or L2ARC?

See discussions on the ZFS intelligent prefetch algorithm. I think Ben 
Rockwood's
description is the best general description:
http://www.cuddletech.com/blog/pivot/entry.php?id=1040

And a more engineer-focused description is at:
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
  -- richard

Thanks for the pointers. While I've seen those articles
(in fact, one of the two non-spam comments in Ben's
blog was mine), rehashing the basics is always useful ;)

Still, how does VDEV prefetch play along with File-level
Prefetch?

Trick question… it doesn't. vdev prefetching is disabled in opensolaris b148, 
illumos,
and Solaris 11 releases. The benefits of having the vdev cache for large 
numbers of
disks does not appear to justify the cost. See
        http://wesunsolve.net/bugid/id/6684116
        https://www.illumos.org/issues/175

For example, if ZFS prefetched 64K from disk
at the SPA level, and those sectors luckily happen to
contain "next" blocks of a streaming-read file, would
the file-level prefetch take the data from RAM cache
or still request them from the disk?

As of b70, vdev_cache only contains metadata. See
http://wesunsolve.net/bugid/id/6437054

In what cases would it make sense to increase the
zfs_vdev_cache_size? Does it apply to all disks
combined, or to each disk (or even slice/partition)
separately?

It applies to each leaf vdev.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to