Re: [zfs-discuss] ZFS + DB + "fragments"

can you guess? Wed, 14 Nov 2007 18:13:42 -0800

> Nathan Kroenert wrote:

...


 What if it did a double update: One to a
> staged area, and another 
> > immediately after that to the 'old' data blocks.
> Still always have 
> > on-disk consistency etc, at a cost of double the
> I/O's...
> 
> This is a non-starter.  Two I/Os is worse than one.

Well, that attitude may be supportable for a write-only workload, but then so 
is the position that you really don't even need *one* I/O (since no one will 
ever need to read the data and you might as well just drop it on the floor).

In the real world, data (especially database data) does usually get read after 
being written, and the entire reason the original poster raised the question 
was because sometimes it's well worth taking on some additional write overhead 
to reduce read overhead.  In such a situation, if you need to protect the 
database from partial-block updates as well as to keep it reasonably laid out 
for sequential table access, then performing the two writes described is about 
as good a solution as one can get (especially if the first of them can be 
logged - even better, logged in NVRAM - such that its overhead can be amortized 
across multiple such updates by otherwise independent processes, and even more 
especially if, as is often the case, the same data gets updated multiple times 
in sufficiently close succession that instead of 2N writes you wind up only 
needing to perform N+1 writes, the last being the only one that updates the 
data in place after the activity has cooled down).

> 
> > Of course, both of these would require non-sparse
> file creation for the 
> > DB etc, but would it be plausible?
> > 
> > For very read intensive and position sensitive
> applications, I guess 
> > this sort of capability might make a difference?
> 
> We are all anxiously awaiting data...

Then you might find it instructive to learn more about the evolution of file 
systems on Unix:

In The Beginning there was the block, and the block was small, and it was 
isolated from its brethren, and darkness was upon the face of the deep because 
any kind of sequential performance well and truly sucked.

Then (after an inexcusably lengthy period of such abject suckage lasting into 
the '80s) there came into the world FFS, and while there was still only the 
block the block was at least a bit larger, and it was at least somewhat less 
isolated from its brethren, and once in a while it actually lived right next to 
them, and while sequential performance still usually sucked at least it sucked 
somewhat less.

And then the disciples Kleiman and McVoy looked upon FFS and decided that mere 
proximity was still insufficient, and they arranged that blocks should (at 
least when convenient) be aggregated into small groups (56 KB actually not 
being all that small at the time, given the disk characteristics back then), 
and the Great Sucking Sound of Unix sequential-access performance was finally 
reduced to something at least somewhat quieter than a dull roar.

But other disciples had (finally) taken a look at commercial file systems that 
had been out in the real world for decades and that had had sequential 
performance down pretty well pat for nearly that long.  And so it came to pass 
that corporations like Veritas (VxFS), and SGI (EFS & XFS), and IBM (JFS) 
imported the concept of extents into the Unix pantheon, and the Gods of 
Throughput looked upon it, and it was good, and (at least in those systems) 
Unix sequential performance no longer sucked at all, and even non-corporate 
developers whose faith was strong nearly to the point of being blind could not 
help but see the virtues revealed there, and began incorporating extents into 
their own work, yea, even unto ext4.

And the disciple Hitz (for it was he, with a few others) took a somewhat 
different tack, and came up with a 'write anywhere file layout' but had the 
foresight to recognize that it needed some mechanism to address sequential 
performance (not to mention parity-RAID performance).  So he abandoned 
general-purpose approaches in favor of the Appliance, and gave it most 
uncommodity-like but yet virtuous NVRAM to allow many consecutive updates to be 
aggregated into not only stripes but adjacent stripes before being dumped to 
disk, and the Gods of Throughput smiled upon his efforts, and they became known 
throughout the land.

Now comes back Sun with ZFS, apparently ignorant of the last decade-plus of 
Unix file system development (let alone development in other systems dating 
back to the '60s).  Blocks, while larger (though not necessarily proportionally 
larger, due to dramatic increases in disk bandwidth), are once again often 
isolated from their brethren.  True, this makes the COW approach a lot easier 
to implement, but (leaving aside the debate about whether COW as implemented in 
ZFS is a good idea at all) there is *no question whatsoever* that it returns a 
significant degree of suckage to sequential performance - especially for data 
subject to small, random updates.

Here ends our lesson for today.

- bill
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + DB + "fragments"

Reply via email to