Re: [zfs-discuss] Yager on ZFS

can you guess? Wed, 12 Dec 2007 09:28:52 -0800

(apologies if this gets posted twice - it disappeared the first time, and it's 
not clear whether that was intentional)

> Hello can,
> 
> Tuesday, December 11, 2007, 6:57:43 PM, you wrote:
> 
>>> Monday, December 10, 2007, 3:35:27 AM, you wrote:
>>>
>>> cyg>  and it 
>>>>> made them slower
>>> cyg> That's the second time you've claimed that, so you'll really at
>>> cyg> least have to describe *how* you measured this even if the
>>> cyg> detailed results of those measurements may be lost in the mists of 
>>> time.
>>>
>>>
>>> cyg> So far you don't really have much of a position to defend at
>>> cyg> all:  rather, you sound like a lot of the disgruntled TOPS users
>>> cyg> of that era.  Not that they didn't have good reasons to feel
>>> cyg> disgruntled - but they frequently weren't very careful about aiming 
>>> their ire accurately.
>>>
>>> cyg> Given that RMS really was *capable* of coming very close to the
>>> cyg> performance capabilities of the underlying hardware, your
>>> cyg> allegations just don't ring true.  Not being able to jump into
>>>
>>> And where is your "proof" that it "was capable of coming very close to
>>> the..."?
> 
> cyg> It's simple:  I *know* it, because I worked *with*, and *on*, it
> cyg> - for many years.  So when some bozo who worked with people with
> cyg> a major known chip on their shoulder over two decades ago comes
> cyg> along and knocks its capabilities, asking for specifics (not even
> cyg> hard evidence, just specific allegations which could be evaluated
> cyg> and if appropriate confronted) is hardly unreasonable.
> 
> Bill, you openly criticize people (their work) who have worked on ZFS
> for years... not that there's anything wrong with that, just please
> realize that because you were working on it it doesn't mean it is/was
> perfect - just the same as with ZFS.

Of course it doesn't - and I never claimed that RMS was anything close to 
'perfect' (I even gave specific examples of areas in which it was *far* from 
perfect).

Just as I've given specific examples of where ZFS is far from perfect.

What I challenged was David's assertion that RMS was severely deficient in its 
*capabilities* - and demanded not 'proof' of any kind but only specific 
examples (comparable in specificity to the examples of ZFS's deficiencies that 
*I* have provided) that could actually be discussed.

> I know, everyone loves their baby...

No, you don't know:  you just assume that everyone is as biased as you and 
others here seem to be.

> 
> Nevertheless just because you were working on and with it, it's not a
> proof. The person you were replaying to was also working with it (but
> not on it I guess). Not that I'm interested in such a proof. Just
> noticed that you're demanding some proof, while you are also just
> write some statements on its performance without any actual proof.

You really ought to spend a lot more time understanding what you've read before 
responding to it, Robert.

I *never* asked for anything like 'proof':  I asked for *examples* specific 
enough to address - and repeated that explicitly in responding to your previous 
demand for 'proof'.  Perhaps I should at that time have observed that your 
demand for 'proof' (your use of quotes suggesting that it was something that 
*I* had demanded) was ridiculous, but I thought my response made that obvious.

> 
> 
> 
>>> Let me use your own words:
>>>
>>> "In other words, you've got nothing, but you'd like people to believe it's 
>>> something.
>>>
>>> The phrase "Put up or shut up" comes to mind."
>>>
>>> Where are your proofs on some of your claims about ZFS?
> 
> cyg> Well, aside from the fact that anyone with even half a clue
> cyg> knows what the effects of uncontrolled file fragmentation are on
> cyg> sequential access performance (and can even estimate those
> cyg> effects within moderately small error bounds if they know what
> cyg> the disk characteristics are and how bad the fragmentation is),
> cyg> if you're looking for additional evidence that even someone
> cyg> otherwise totally ignorant could appreciate there's the fact that
> 
> I've never said there are not fragmentation problems with ZFS.

Not having made a study of your collected ZFS contributions here I didn't know 
that.  But some of ZFS's developers are on record stating that they believe 
there is no need to defragment (unless they've changed their views since and 
not bothered to make us aware of it), and in the entire discussion in the 
recent 'ZFS + DB + "fragments"' thread there were only three contributors 
(Roch, Anton, and I) who seemed willing to admit that any problem existed.

So since one of my 'claims' for which you requested substantiation involved 
fragmentation problems, it seemed appropriate to address them.

> Well, actually I've been hit by the issue in one environment.

But didn't feel any impulse to mention that during all the preceding 
discussion, I guess.

> Also you haven't done your work home properly, as one of ZFS
> developers actually stated they are going to work on ZFS
> de-fragmentation and disk removal (pool shrinking).
> See http://www.opensolaris.org/jive/thread.jspa?messageID=139680&#139680

Hmmm - there were at least two Sun ZFS personnel participating in the database 
thread, and they never mentioned this.  I guess they didn't do their 'work 
home' properly either (and unlike me they're paid to do it).

As for me, my commitment here is too limited for me to have even scanned the 
entire thread list, let alone read discussions with names like "ZFS send needs 
optimalization" that seem unlikely to be relevant to my particular interests.

> Lukasz happens to be my friend who is also working with the same
> environment.

That just might help explain why you happened to be aware of this obscure 
little tidbit of information, then.

> 
> The point is, and you as a long time developer (I guess) should know it,
> you can't have everything done at once (lack of resources, and it takes
> some time anyway) so you must prioritize.

The issues here are not issues of prioritization but issues of denial.  Your 
citation above is the first suggestion that I've seen (and by all appearances 
the first that anyone else participating in these discussions has seen) that 
the ZFS crew considers the fragmentation issue important enough to merit active 
attention in the future.

Do you by any chance have any similar hint of recognition that RAID-Z might 
benefit from revamping as well?

 ZFS is open source and if
> someone thinks that given feature is more important than the other
> he/she should try to fix it or at least voice it here so ZFS
> developers can possibly adjust their priorities if there's good enough
> and justified demand.

That just won't wash, Robert:  as I noted above, the problem here has been 
denial that these are flaws at all, not just a debate about how to 'prioritize' 
addressing them (though in the case of RAID-Z I recall seeing some indication 
that at least one person was interested in - or perhaps actually is - working 
on RAID-5-like support because they see problems with RAID-Z).

> 
> Now the important part - quite a lot of people are using ZFS, from
> desktop usage, their laptops, small to big production environments,
> clustered environments, SAN environemnts, JBODs, entry-level to high-end 
> arrays,
> different applications, workloads, etc. And somehow you can't find
> many complaints about ZFS fragmentation.

The entire basis for that database thread (initiated by someone else, you will 
note) was ZFS fragmentation, and a great deal of its content arose from the 
resistance of many here to the idea that it might constitute a problem *in that 
specific environment* (let alone more generally).

Most environments actually aren't all that performance-sensitive, so of course 
they don't complain.  Even if they run into problems, they just buy more 
hardware - because that's what they're used to doing:  the idea that better 
software could eliminate the need to do so either doesn't cross their minds at 
all or seems like too much of a pipe dream to take seriously.

Trouble is, ZFS and its fanboys tout it as offering *superior* - not merely 
adequate - performance, whereas for some not-all-that-uncommon situations its 
performance can be worse *by over an order of magnitude* due to the fragmention 
which is designed into its operation and for which no current relief is 
available (nor was any relief apparently generally known to be projected for 
the future, until now).  The fact that many installations may be able to laugh 
off an order-of-magnitude performance handicap is not the point:  the point is 
that if the claims for ZFS had been more balanced in this area, I'd have far 
less to criticize - I'd just observe that there was significant room for 
improvement and leave it at that.

...

> Then you find people like Pawel Jakub Davidek (guy who ported ZFS to
> FreeBSD) who started experimenting with RAID-5 like implementation
> with ZFS - he provided even some numbers showing it might be worth
> looking at. That's what community is about.

Ah - that may be what I was recalling above.  Strange, once again, that it 
never popped up in the current discussions until now.

> 
> I don't see any point complaining about ZFS all over again - have you
> actually run into the problem with ZFS yourself? I guess not.

I haven't been sent to Guantanamo and held for years without trial, either - 
but that doesn't mean that I have no business criticizing the practice, and in 
particular persisting if that criticism is met with denial that any problem 
exists (even though indeed it's not *my* problem).

 You just
> assuming (correctly for some usage cases). I guess your message has
> been well heard.

But hardly well understood.

 Since you're not interested in anything more that
> bashing or complaining all the time about the same theoretical "issues" rather
> than contributing somehow (even by providing some test results which
> could be repeated)

I've told you what I'm doing, and why I'm doing it, and why it's beyond stupid 
to complain about the lack of 'test results' in situations as clear-cut as 
these are, and how to go about fixing them - and you still come back with crap 
like this.  Is it any wonder that my respect for so many of you is close to 
zero?

 I wouldn't wait for any positive feedback if I were
> you - anyway, what kind of feedback are you waiting for?

I'm waiting for the idiots either to shut up or to shape up.  And I remain 
sufficiently (though now verging on perversely) curious about just how long 
that will take to keep working on it.

> 
> 
> cyg> Last I knew, ZFS was still claiming that it needed nothing like
> cyg> defragmentation, while describing write allocation mechanisms
> cyg> that could allow disastrous degrees of fragmentation under
> cyg> conditions that I've described quite clearly.
> 
> Well, I haven't talked to ZFS (yet) so I don't know what he claims :))

Perhaps you should do *your* 'work home' more properly, then:  there are 
several developers who have presumed to speak for ZFS over the years, and their 
statements are well documented (you could start with the presentations that 
they've made).

> If you are talking about ZFS developers then you can actually find
> some evidence that they do see that problem and want to work on it.
> Again see for example: 
> http://www.opensolaris.org/jive/thread.jspa?messageID=139680&#139680
> Bill, at least look at the list archives first.

I believe that I covered that adequately above, Robert.  But given your 
demonstrated inability to absorb information even after several repetitions, 
I'll suggest that you simply keep working on understanding it (and the rest of 
this response) until you actually *do* understand it, before attempting to 
reply to it.

> 
> And again, "under conditions that I've described quite clearly." -
> that's exactly the problem. You've just described something while
> others do have actual and real problems which should be addressed
> first.

Once again, you are confusing the very real problem of stone-wall denial here 
with a simple issue of prioritization.

> 
> 
> cyg> If ZFS made no
> cyg> efforts whatsoever in this respect the potential for unacceptable
> cyg> performance would probably already have been obvious even to its
> cyg> blindest supporters,
> 
> Well, is it really so hard to understand that a lot of people use ZFS
> because it actually solves their problems?

Not at all:  it's just far from obvious that it solves their problems any (let 
alone significantly) better than other existing open source options.  And that 
would not be any issue if some people here weren't so zealous in asserting 
ZFS's alleged stunning superiority - but if they continue to do so, I'll 
continue to challenge them to *substantiate* that claim.

 No matter what case
> scenarios you will find to theoretically show some ZFS weaker points,
> at the end what matters is if it does solve customer problems. And for
> many users it does, definitely not for all of them.
> I would argue that no matter what file system you will test or even
> design, one can always find a corner cases when it will behave less
> than optimal. For a general purpose file system what matters is that
> in most common cases it's good enough.

And if "It's good enough" were all that people were claiming about ZFS there'd 
be very little to dispute (though no less room for improvement, of course - and 
probably a great deal less resistance to suggestions of how to go about it).

...

> cyg> Then there's RAID-Z, which smears individual blocks across
> cyg> multiple disks in a manner that makes small-to-medium random
> cyg> access throughput suck.  Again, this is simple logic and physics:
> cyg> if you understand the layout and the disk characteristics, you
> cyg> can predict the effects on a heavily parallel workload with
> cyg> fairly decent accuracy (I think that Roch mentioned this casually
> cyg> at one point, so it's hardly controversial, and I remember
> cyg> reading a comment by Jeff Bonwick that he was pleased with the
> cyg> result of one benchmark - which made no effort to demonstrate the
> cyg> worst case - because the throughput penalty was 'only' a factor
> cyg> of 2 rather than the full factor of N).
> 
> Yeah, nothing really new here. If you need a guy from Sun, then read
> Roch's post on RAID-Z performance. Nothing you've discovered here.

Hmmm.  I took a quick look through Roch's posts here and didn't find a title 
that suggested such a topic (though he does tend to get involved in discussions 
that are also of interest to me, so the time wasn't completely wasted).

If you're referring to his mid-2006 blog post, had you read the discussion that 
followed it you would have found that I participated actively and in fact 
raised many of the same issues that I've raised again here (points that he 
either hadn't covered or hadn't realized had alternatives that did not suffer 
from comparable limitations, plus more general observations on the 
fragmentation problem).  Incidentally (since comments to that post are now 
closed), his IOPS calculation at the end was flawed:  the formula he presented 
yielded not the number of disks to use in each group but the number of groups 
to use.

> Nevertheless RAID-Z[2] is good enough for many people.
> I know that simple logic and physics states that relativity equations
> provide better accuracy than Newton's - nevertheless in most scenarios
> I'm dealing with it doesn't really matter from a practical point of
> view.

Given your expressed preference for 'real problems' above, it's worth noting 
that in my quick scan through Roch's posts here I happened upon this (referring 
to performance issues using RAID-Z):

"Now I have to find a way to justify myself with my head office that after 
spending 100k+ in hw and migrating to "the most advanced OS" we are running 
about 8 time slower :)"

Some people might consider such a problem to be 'real' (and somewhat personal 
as well); he goes on to observe that "while that rsync process is running, 
ZONEX is completely unusable because of the rsync I/O load" - another 
'real-world' indication of how excessive (and unnecessary) RAID-Z disk loading 
compromises other aspects of system performance (though limited scheduling 
intelligence may have contributed to this as well).

Since I stumbled upon that without even looking for it or scanning more than a 
minute fraction of 1% of the posts here, there's an excellent possibility that 
considerably more such are lurking elsewhere in this forum (want to do some 
'work home' and find out?).

> 
> Then, in some environments RAID-Z2 (on JBOD) actually provides better
> performance than RAID-5 (and HW R5 for that matter). And, opposite
> to you, I'm not speculating but I've been working with such
> environment (lot of concurrent writes which are more critical than
> much less reads later).

Don't confuse apples with oranges.  As long as it can accumulate enough dirty 
data before it has to flush it to disk, COW with batch write-back can make 
*any* write strategy work well.  So there's no need to accept the brain-damaged 
nature of RAID-Z's performance with small-to-medium-sized random accesses in 
order to obtain the good performance that you describe above:  a good ZFS 
RAID-5-like implementation could do just as well for those workloads *plus* 
beat both conventional RAID-5 and RAID-Z at small-update workloads *plus* 
cremate RAID-Z in terms of throughput on small-to-medium read workloads.

The main limitation of the straight-forward way to implement this is that it 
would only be easily applicable to multi-block files, because each stripe could 
contain data from only one file (so as to avoid an additional level of access 
indirection); of course, in principle you could stripe a file as small as four 
disk sectors (2 KB) across 4 disks plus one for parity, so this approach would 
be inapplicable only to *tiny* files - around the size that one might start 
considering embedding in their disk inode, given a design that allowed that 
flexibility.  While small files may get a large share of the access load in 
some environments, in most environments they consume only a small proportion of 
the storage space, so just leaving them to be mirrored would probably be an 
eminently viable strategy - and exploring more interesting alternatives 
wouldn't be productive anyway until you've managed to understand the basic one.

> So when you saying that RAID-Z is brain-damaging - well, it's
> mostly positive experience of a lot of people with RAID-Z vs. your statement 
> without any
> real-world backing.

I just provided one example above from a participant in this forum (and it 
seems unlikely that it's the only one).  Does that mean that I get to accuse 
you of not having "done your work home properly", because you were unaware of 
it?

...

> cyg> And the way ZFS aparently dropped the ball on its alleged
> cyg> elimination of any kind of 'volume management' by requiring that
> cyg> users create explicit (and matched) aggregations of disks to
> cyg> support mirroring and RAID-Z.
> 
> # mkfile 128m f1 ; mkfile 128m f2 ; mkfile 256m f3 ; mkfile 256m f4
> # zpool create bill mirror /var/tmp/f1 /var/tmp/f2 mirror /var/tmp/f3 
> /var/tmp/f4
> # zpool list
> NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
> bill                    373M     90K    373M     0%  ONLINE     -
> #
> # mkfile 128m f11 ; mkfile 256m f44
> # zpool destroy bill
> # zpool create bill raidz /var/tmp/f11 /var/tmp/f1 /var/tmp/f2 raidz 
> /var/tmp/f3 /var/tmp/f4 /var/tmp/f44
> # zfs list
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> bill   101K   715M  32.6K  /bill
> #
> (2*128+2*256=768) - looks fine.
> 
> If you are talking about a solution which enables user to mix
> different disk sizes in the same mirror or RAID-5 group and while all
> the time providing given protection allows you to utilize 100% of all
> disk capacities.... well, what is that solution? Is it free?
> Open source? Available on general purpose OS? Or commodity HW?
> Available at all? :P

I'm talking about what ZFS *could* have provided to make good on their claim 
that they had eliminated (or at least effectively hidden) volume-management:  a 
*real* 'storage pool' that just accepted whatever disks you gave it and could 
be used transparently to provide whatever form of redundancy was desired on a 
per-file basis, with the ability to add or remove individual disks at will.  No 
need to create separate pools for non-redundant data, mirrors, parity RAID, 
etc.:  it would 'just work', in the manner that some people would like to claim 
ZFS already does (and to some degree perhaps it actually does, but not when it 
comes to redundant storage).

And yes, across a very wide range of disk-size variations it's possible to 
utilize 100% of the capacity of each individual disk in such a pool using 
relatively simple distribution strategies - especially if you can perform very 
minor rearrangements to cover corner cases (though ZFS-style snapshots would 
hinder that, which is one of the reasons - defragmentation being another, and 
rebalancing across multiple nodes being a third - that I favor a different 
snapshot approach).

I described this here well over a year ago, and Bill Moore said they had 
actually considered it but had shelved it for various reasons (none of which 
appeared insurmountable - but he may have been making different assumptions 
about how it could be implemented).

> 
> 
> cyg> Now, if someone came up with any kind of credible rebuttal to
> cyg> these assertions we could at least discuss it on technical
> cyg> grounds.  But (and again you should consider this significant) no
> cyg> one has:  all we have is well-reasoned analysis on the one hand
> cyg> and some (often fairly obnoxious) fanboy babble on the other.  If
> cyg> you step back, make the effort required to *understand* that
> cyg> analysis, and try to look at the situation objectively, which do you 
> find more credible?
> 
> Most credible to me is actual user experience than some theoretical
> burbling.

That's usually the case with amateurs who have difficulty understanding in 
detail how the systems that they use work.  But at least many of them have the 
sense not to argue interminably with people who have actually designed and 
built such systems and *do* understand them in (excruciating) detail.

...

> cyg> ZFS has other deficiencies, but they're more fundamental choices
> cyg> involving poor trade-offs and lack of vision than outright (and
> cyg> easily rectifiable) flaws, so they could more justifiably be
> cyg> termed 'judgment calls' and I haven't delved as deeply into them.
> 
> And what they are?

Once again, you've failed to do your 'work home' - since I've mentioned them 
here previously:

1.  Implementation tied to a centralized server - scales only 'up', not 'out'.

2.  Snapshot mechanism that makes reorganization expensive (including 
reorganization across nodes - so it's a scaling impediment as well as a 
performance trade-off).

3.  Explicit pointer (indirect block) trees for large files rather than a 
flatter mechanism that avoids deep tree look-ups (with high-level data 
distribution handled algorithmically - which also helps avoid the need to 
update pointers in bulk when inter-node rebalancing operations occur and 
confines pointer updates on writes to the node that holds the data).

4.  Trying to use block size to manage both access granularity and on-disk 
contiguity for performance (though background reorganization could help the 
latter and leave the former free to adjust just for access granularity - so 
that design choice could be considered one of the flaws already discussed 
above).

There were probably more, but as you likely wouldn't understand them any better 
than you've understood anything else there's little point in dredging them up 
again.

...

> cyg> But they're the main reason I have no interest in 'working on'
> 
> Well, you're not using ZFS, you are not interested in working on it,
> all you are interested is finding some potential corner cases bad for
> ZFS and bashing it. If you put at least 10% of your energy you're
> putting in your 'holy war' you would at least provide some benchmarks
> (filebench?) showing these corner cases in comparison to other
> mind-blowing solutions on the market which are much better than ZFS,
> so we can all reproduce them and try to address ZFS problems.

I really don't have much interest in meeting *your* criteria for being 
convinced, Robert - at least in part because it's not clear that *anything* 
would convince you.  So it's more fun to see how completely committed people 
like you are to keeping their heads firmly wedged up where the sun don't shine 
to avoid actually facing up to the fact that ZFS just ain't quite what you 
thought it was.

...

> cyg> You really haven't bothered to read much at all, have you.  I've
> cyg> said, multiple times, that I came here initially in the hope of
> cyg> learning something interesting.  More recently, I came here
> cyg> because I offered a more balanced assessment of ZFS's strengths
> cyg> and weaknesses in responding to the Yager article and wanted to
> cyg> be sure that I had not treated ZFS unfairly in some way - which
> cyg> started this extended interchange.  After that, I explained that
> cyg> while the likelihood of learning anything technical here was
> cyg> looking pretty poor, I didn't particularly like some of the
> cyg> personal attacks that I'd been subject to and had decided to confront 
> them.
> 
> Well, every time I saw it was you 'attacking' other people first.

Then you obviously missed a great many posts, but given the readily-apparent 
quality of your other research I don't find that surprising at all.

...

> If you are not contributing here, and you are not learning here - wy
> are you here? I'm serious - why?

I explained that, in detail, in my previous post.  Given the expressed 
'seriousness' of your repeat question here I was going to ask whether you are 
functionally illiterate, but your advice below brought up an another 
possibility.

...

> cyg> No, my attitude is that people too stupid and/or too lazy to
> cyg> understand what I *have* been delivering don't deserve much respect if 
> they complain.
> 
> Maybe you should thing about that "stupid" part...

As usual, I thought about it *before* I said it.  However, I did inadvertently 
omit a third possibility - that people such as you (who don't quite strike me 
as being abjectly stupid or drop-dead lazy) are instead simply too 
intellectually dishonest (whether intentionally or so habitually that it has 
become subconscious) to understand what I've been 'delivering'.

So you're right:  there's always room to refine one's understanding, and 
another relevant quotation comes to mind ("There are none so blind as those who 
will not see").

> Maybe, just maybe, it's possible that all people around you don't
> understand you, that world is wrong and we're all so stupid. Well,
> maybe. Even if it is so, then perhaps it's time to stop being Don Quixote
> and move on?

No, but it might be getting close to it - I'll let you know.

- bill

This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Yager on ZFS

Reply via email to