> 
> I've been observing two threads on zfs-discuss with
> the following 
> Subject lines:
> 
> Yager on ZFS
> ZFS + DB + "fragments"
> 
> and have reached the rather obvious conclusion that
> the author "can 
> you guess?" is a professional spinmeister,

Ah - I see we have another incompetent psychic chiming in - and judging by his 
drivel below a technical incompetent as well.  While I really can't help him 
with the former area, I can at least try to educate him in the latter.

...

> Excerpt 1:  Is this premium technical BullShit (BS)
> or what?

Since you asked:  no, it's just clearly beyond your grade level, so I'll try to 
dumb it down enough for you to follow.

> 
> ------------- BS 301 'grad level technical BS'
> -----------
> 
> Still, it does drive up snapshot overhead, and if you
> start trying to 
> use snapshots to simulate 'continuous data
> protection' rather than 
> more sparingly the problem becomes more significant
> (because each 
> snapshot will catch any background defragmentation
> activity at a 
> different point, such that common parent blocks may
> appear in more 
> than one snapshot even if no child data has actually
> been updated). 
> Once you introduce CDP into the process (and it's
> tempting to, since 
> the file system is in a better position to handle it
> efficiently than 
> some add-on product), rethinking how one approaches
> snapshots (and COW 
> in general) starts to make more sense.

Do you by any chance not even know what 'continuous data protection' is?  It's 
considered a fairly desirable item these days and was the basis for several hot 
start-ups (some since gobbled up by bigger fish that apparently agreed that 
they were onto something significant), since it allows you to roll back the 
state of individual files or the system as a whole to *any* historical point 
you might want to (unlike snapshots, which require that you anticipate points 
you might want to roll back to and capture them explicitly - or take such 
frequent snapshots that you'll probably be able to get at least somewhere near 
any point you might want to, a second-class simulation of CDP which some 
vendors offer because it's the best they can do and is precisely the activity 
which I outlined above, expecting that anyone sufficiently familiar with file 
systems to be able to follow the discussion would be familiar with it).

But given your obvious limitations I guess I should spell it out in words of 
even fewer syllables:

1.  Simulating CDP without actually implementing it means taking very frequent 
snapshots.

2.  Taking very frequent snapshots means that you're likely to interrupt 
background defragmentation activity such that one child of a parent is moved 
*before* the snapshot is taken while another is moved *after* the snapshot is 
taken, resulting in the need to capture a before-image of the parent (because 
at least one of its pointers is about to change) *and all ancestors of the 
parent* (because the pointer change will propagate through all the ancestral 
checksums - and pointers, with COW) in every snapshot that occurs immediately 
prior to moving *any* of its children rather than just having to capture a 
single before-image of the parent and all its ancestors after which all its 
child pointers will likely get changed before the next snapshot is taken.

So that's what any competent reader should have been able to glean from the 
comments that stymied you.  The paragraph's concluding comments were 
considerably more general in nature and thus legitimately harder to follow:  
had you asked for clarification rather than just assumed that they were BS 
simply because you couldn't understand them you would not have looked like such 
an idiot, but since you did call them into question I'll now put a bit more 
flesh on them for those who may be able to follow a discussion at that level of 
detail:

3.  The file system is in a better position to handle CDP than some external 
mechanism because

a) the file system knows (right down to the byte level if it wants to) exactly 
what any individual update is changing,

b) the file system knows which updates are significant (e.g., there's probably 
no intrinsic need to capture rollback information for lazy writes because the 
application didn't care whether they were made persistent at that time, but for 
any explicitly-forced writes or syncs a rollback point should be established), 
and

c) the file system is already performing log forces (where a log is involved) 
or batch disk updates (a la ZFS) to honor such application-requested 
persistence, and can piggyback the required CDP before-image persistence on 
them rather than requiring separate synchronous log or disk accesses to do so.

4.  If you've got full-fledged CDP, it's questionable whether you need 
snapshots as well (unless you have really, really inflexible requirements for 
virtually instantaneous rollback and/or for high-performance writable-clone 
access) - and if CDP turns out to be this decade's important new file system 
feature just as snapshots were last decade's it will be well worth having 
optimized for.

5.  Block-level COW technology just doesn't cut it for full-fledged CDP unless 
you can assume truly unlimited storage space:  not only does it encounter even 
worse instances of the defrag-related parent-block issues described above 
(which I brought up in a different context) but, far worse, it requires that 
every generation of every block in the system live forever (or at least for the 
entire time-span within which rollback is contemplated).

Hence the rethinking that I mentioned.  COW techniques are attractive from an 
ease-of-implementation standpoint for moderately infrequent snapshots, but as 
one approaches CDP-like support they become increasingly infeasible.  Whereas 
transaction-log-protected approaches can handle CDP very efficiently (in a 
manner analogous to a database before-image rollback log), as well as being 
able to offer everything else that ZFS does with better run-time efficiency 
(e.g., no longer must the entire ancestral path be updated on disk whenever a 
leaf node is), plus update-in-place facilities to support good 
sequential-streaming performance where that makes sense - but at the cost not 
only of increased implementation complexity but of the need for something 
resembling innovation (at least in the file system context).  And for the 
occasional installation that really requires high-performance snapshot 
rollback/writable clone facilities, you can still implement them effectively at 
the block l
 evel *underneath* all this file-level stuff and then get rid of their overhead 
when the requirement has expired.

That's as dumbed-down as I'm going to get:  if you still can't understand it, 
please seek help from a colleague.

> 
> ------------- end of BS 301 'grad level technical BS'
> -----------
> 
> Comment: Amazing: so many words, so little meaningful
> technical 
> content!

Oh, dear:  you seem to have answered the question that you posed above with the 
same abject cluelessness which you're bringing to the rest of your post.  Oh, 
well:  there's something to be said for consistency, I guess.

> 
> Excerpt 2: Even better than Excerpt 1 - truely
> exceptional BullShit:

No - just truly exceptional arrogance on your part:  you really ought to 
develop at least minimal understanding of a subject before deciding to tackle 
someone who already has a great deal more than that.

> 
> ------------- BS 401 'PhD level technical BS'
> ------------------
> 
> No, but I described how to use a transaction log to
> do so and later on 
> in the post how ZFS could implement a different
> solution more 
> consistent with its current behavior.  In the case of
> the transaction 
> log, the key is to use the log not only to protect
> the RAID update but 
> to protect the associated higher-level file operation
> as well, such 
> that a single log force satisfies both (otherwise,
> logging the RAID 
> update separately would indeed slow things down -
> unless you had NVRAM 
> to use for it, in which case you've effectively just
> reimplemented a 
> low-end RAID controller - which is probably why no
> one has implemented 
> that kind of solution in a stand-alone software RAID
> product).

That one was already clear enough that I'm just going to let you find a helpful 
colleague to explain it to you, as suggested above.  Someone who knows 
something about write-ahead logging and how it's usable not only to protect 
operations but to enhance their performance by capturing the amount of 
information required to replay one or more serial updates to the same or 
associated data in the log or in supplements to it while deferring final 
batch-propagation of those updates back to the main database might be a good 
bet.

Making due allowances for your being in Texas and thus being intimately 
acquainted with BS on a very personal level, I'll suggest that you refrain from 
further solidifying that state's stereotypes in the Internet group-mind until 
you've cleared your insights with someone who actually has some acquaintance 
with technologies such as file systems, and transaction managers, and log 
implementations - all of which I have both studied in depth and been well-paid 
to write from scratch.  You might also consider posting your babble from a 
personal rather than a professional location, unless your profession is 
completely unrelated to technology in particular and to competence in general.

- bill
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to