(apologies if this gets posted twice - it disappeared the first time, and it's not clear whether that was intentional)
> Hello can, > > Tuesday, December 11, 2007, 6:57:43 PM, you wrote: > >>> Monday, December 10, 2007, 3:35:27 AM, you wrote: >>> >>> cyg> and it >>>>> made them slower >>> cyg> That's the second time you've claimed that, so you'll really at >>> cyg> least have to describe *how* you measured this even if the >>> cyg> detailed results of those measurements may be lost in the mists of >>> time. >>> >>> >>> cyg> So far you don't really have much of a position to defend at >>> cyg> all: rather, you sound like a lot of the disgruntled TOPS users >>> cyg> of that era. Not that they didn't have good reasons to feel >>> cyg> disgruntled - but they frequently weren't very careful about aiming >>> their ire accurately. >>> >>> cyg> Given that RMS really was *capable* of coming very close to the >>> cyg> performance capabilities of the underlying hardware, your >>> cyg> allegations just don't ring true. Not being able to jump into >>> >>> And where is your "proof" that it "was capable of coming very close to >>> the..."? > > cyg> It's simple: I *know* it, because I worked *with*, and *on*, it > cyg> - for many years. So when some bozo who worked with people with > cyg> a major known chip on their shoulder over two decades ago comes > cyg> along and knocks its capabilities, asking for specifics (not even > cyg> hard evidence, just specific allegations which could be evaluated > cyg> and if appropriate confronted) is hardly unreasonable. > > Bill, you openly criticize people (their work) who have worked on ZFS > for years... not that there's anything wrong with that, just please > realize that because you were working on it it doesn't mean it is/was > perfect - just the same as with ZFS. Of course it doesn't - and I never claimed that RMS was anything close to 'perfect' (I even gave specific examples of areas in which it was *far* from perfect). Just as I've given specific examples of where ZFS is far from perfect. What I challenged was David's assertion that RMS was severely deficient in its *capabilities* - and demanded not 'proof' of any kind but only specific examples (comparable in specificity to the examples of ZFS's deficiencies that *I* have provided) that could actually be discussed. > I know, everyone loves their baby... No, you don't know: you just assume that everyone is as biased as you and others here seem to be. > > Nevertheless just because you were working on and with it, it's not a > proof. The person you were replaying to was also working with it (but > not on it I guess). Not that I'm interested in such a proof. Just > noticed that you're demanding some proof, while you are also just > write some statements on its performance without any actual proof. You really ought to spend a lot more time understanding what you've read before responding to it, Robert. I *never* asked for anything like 'proof': I asked for *examples* specific enough to address - and repeated that explicitly in responding to your previous demand for 'proof'. Perhaps I should at that time have observed that your demand for 'proof' (your use of quotes suggesting that it was something that *I* had demanded) was ridiculous, but I thought my response made that obvious. > > > >>> Let me use your own words: >>> >>> "In other words, you've got nothing, but you'd like people to believe it's >>> something. >>> >>> The phrase "Put up or shut up" comes to mind." >>> >>> Where are your proofs on some of your claims about ZFS? > > cyg> Well, aside from the fact that anyone with even half a clue > cyg> knows what the effects of uncontrolled file fragmentation are on > cyg> sequential access performance (and can even estimate those > cyg> effects within moderately small error bounds if they know what > cyg> the disk characteristics are and how bad the fragmentation is), > cyg> if you're looking for additional evidence that even someone > cyg> otherwise totally ignorant could appreciate there's the fact that > > I've never said there are not fragmentation problems with ZFS. Not having made a study of your collected ZFS contributions here I didn't know that. But some of ZFS's developers are on record stating that they believe there is no need to defragment (unless they've changed their views since and not bothered to make us aware of it), and in the entire discussion in the recent 'ZFS + DB + "fragments"' thread there were only three contributors (Roch, Anton, and I) who seemed willing to admit that any problem existed. So since one of my 'claims' for which you requested substantiation involved fragmentation problems, it seemed appropriate to address them. > Well, actually I've been hit by the issue in one environment. But didn't feel any impulse to mention that during all the preceding discussion, I guess. > Also you haven't done your work home properly, as one of ZFS > developers actually stated they are going to work on ZFS > de-fragmentation and disk removal (pool shrinking). > See http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠 Hmmm - there were at least two Sun ZFS personnel participating in the database thread, and they never mentioned this. I guess they didn't do their 'work home' properly either (and unlike me they're paid to do it). As for me, my commitment here is too limited for me to have even scanned the entire thread list, let alone read discussions with names like "ZFS send needs optimalization" that seem unlikely to be relevant to my particular interests. > Lukasz happens to be my friend who is also working with the same > environment. That just might help explain why you happened to be aware of this obscure little tidbit of information, then. > > The point is, and you as a long time developer (I guess) should know it, > you can't have everything done at once (lack of resources, and it takes > some time anyway) so you must prioritize. The issues here are not issues of prioritization but issues of denial. Your citation above is the first suggestion that I've seen (and by all appearances the first that anyone else participating in these discussions has seen) that the ZFS crew considers the fragmentation issue important enough to merit active attention in the future. Do you by any chance have any similar hint of recognition that RAID-Z might benefit from revamping as well? ZFS is open source and if > someone thinks that given feature is more important than the other > he/she should try to fix it or at least voice it here so ZFS > developers can possibly adjust their priorities if there's good enough > and justified demand. That just won't wash, Robert: as I noted above, the problem here has been denial that these are flaws at all, not just a debate about how to 'prioritize' addressing them (though in the case of RAID-Z I recall seeing some indication that at least one person was interested in - or perhaps actually is - working on RAID-5-like support because they see problems with RAID-Z). > > Now the important part - quite a lot of people are using ZFS, from > desktop usage, their laptops, small to big production environments, > clustered environments, SAN environemnts, JBODs, entry-level to high-end > arrays, > different applications, workloads, etc. And somehow you can't find > many complaints about ZFS fragmentation. The entire basis for that database thread (initiated by someone else, you will note) was ZFS fragmentation, and a great deal of its content arose from the resistance of many here to the idea that it might constitute a problem *in that specific environment* (let alone more generally). Most environments actually aren't all that performance-sensitive, so of course they don't complain. Even if they run into problems, they just buy more hardware - because that's what they're used to doing: the idea that better software could eliminate the need to do so either doesn't cross their minds at all or seems like too much of a pipe dream to take seriously. Trouble is, ZFS and its fanboys tout it as offering *superior* - not merely adequate - performance, whereas for some not-all-that-uncommon situations its performance can be worse *by over an order of magnitude* due to the fragmention which is designed into its operation and for which no current relief is available (nor was any relief apparently generally known to be projected for the future, until now). The fact that many installations may be able to laugh off an order-of-magnitude performance handicap is not the point: the point is that if the claims for ZFS had been more balanced in this area, I'd have far less to criticize - I'd just observe that there was significant room for improvement and leave it at that. ... > Then you find people like Pawel Jakub Davidek (guy who ported ZFS to > FreeBSD) who started experimenting with RAID-5 like implementation > with ZFS - he provided even some numbers showing it might be worth > looking at. That's what community is about. Ah - that may be what I was recalling above. Strange, once again, that it never popped up in the current discussions until now. > > I don't see any point complaining about ZFS all over again - have you > actually run into the problem with ZFS yourself? I guess not. I haven't been sent to Guantanamo and held for years without trial, either - but that doesn't mean that I have no business criticizing the practice, and in particular persisting if that criticism is met with denial that any problem exists (even though indeed it's not *my* problem). You just > assuming (correctly for some usage cases). I guess your message has > been well heard. But hardly well understood. Since you're not interested in anything more that > bashing or complaining all the time about the same theoretical "issues" rather > than contributing somehow (even by providing some test results which > could be repeated) I've told you what I'm doing, and why I'm doing it, and why it's beyond stupid to complain about the lack of 'test results' in situations as clear-cut as these are, and how to go about fixing them - and you still come back with crap like this. Is it any wonder that my respect for so many of you is close to zero? I wouldn't wait for any positive feedback if I were > you - anyway, what kind of feedback are you waiting for? I'm waiting for the idiots either to shut up or to shape up. And I remain sufficiently (though now verging on perversely) curious about just how long that will take to keep working on it. > > > cyg> Last I knew, ZFS was still claiming that it needed nothing like > cyg> defragmentation, while describing write allocation mechanisms > cyg> that could allow disastrous degrees of fragmentation under > cyg> conditions that I've described quite clearly. > > Well, I haven't talked to ZFS (yet) so I don't know what he claims :)) Perhaps you should do *your* 'work home' more properly, then: there are several developers who have presumed to speak for ZFS over the years, and their statements are well documented (you could start with the presentations that they've made). > If you are talking about ZFS developers then you can actually find > some evidence that they do see that problem and want to work on it. > Again see for example: > http://www.opensolaris.org/jive/thread.jspa?messageID=139680𢆠 > Bill, at least look at the list archives first. I believe that I covered that adequately above, Robert. But given your demonstrated inability to absorb information even after several repetitions, I'll suggest that you simply keep working on understanding it (and the rest of this response) until you actually *do* understand it, before attempting to reply to it. > > And again, "under conditions that I've described quite clearly." - > that's exactly the problem. You've just described something while > others do have actual and real problems which should be addressed > first. Once again, you are confusing the very real problem of stone-wall denial here with a simple issue of prioritization. > > > cyg> If ZFS made no > cyg> efforts whatsoever in this respect the potential for unacceptable > cyg> performance would probably already have been obvious even to its > cyg> blindest supporters, > > Well, is it really so hard to understand that a lot of people use ZFS > because it actually solves their problems? Not at all: it's just far from obvious that it solves their problems any (let alone significantly) better than other existing open source options. And that would not be any issue if some people here weren't so zealous in asserting ZFS's alleged stunning superiority - but if they continue to do so, I'll continue to challenge them to *substantiate* that claim. No matter what case > scenarios you will find to theoretically show some ZFS weaker points, > at the end what matters is if it does solve customer problems. And for > many users it does, definitely not for all of them. > I would argue that no matter what file system you will test or even > design, one can always find a corner cases when it will behave less > than optimal. For a general purpose file system what matters is that > in most common cases it's good enough. And if "It's good enough" were all that people were claiming about ZFS there'd be very little to dispute (though no less room for improvement, of course - and probably a great deal less resistance to suggestions of how to go about it). ... > cyg> Then there's RAID-Z, which smears individual blocks across > cyg> multiple disks in a manner that makes small-to-medium random > cyg> access throughput suck. Again, this is simple logic and physics: > cyg> if you understand the layout and the disk characteristics, you > cyg> can predict the effects on a heavily parallel workload with > cyg> fairly decent accuracy (I think that Roch mentioned this casually > cyg> at one point, so it's hardly controversial, and I remember > cyg> reading a comment by Jeff Bonwick that he was pleased with the > cyg> result of one benchmark - which made no effort to demonstrate the > cyg> worst case - because the throughput penalty was 'only' a factor > cyg> of 2 rather than the full factor of N). > > Yeah, nothing really new here. If you need a guy from Sun, then read > Roch's post on RAID-Z performance. Nothing you've discovered here. Hmmm. I took a quick look through Roch's posts here and didn't find a title that suggested such a topic (though he does tend to get involved in discussions that are also of interest to me, so the time wasn't completely wasted). If you're referring to his mid-2006 blog post, had you read the discussion that followed it you would have found that I participated actively and in fact raised many of the same issues that I've raised again here (points that he either hadn't covered or hadn't realized had alternatives that did not suffer from comparable limitations, plus more general observations on the fragmentation problem). Incidentally (since comments to that post are now closed), his IOPS calculation at the end was flawed: the formula he presented yielded not the number of disks to use in each group but the number of groups to use. > Nevertheless RAID-Z[2] is good enough for many people. > I know that simple logic and physics states that relativity equations > provide better accuracy than Newton's - nevertheless in most scenarios > I'm dealing with it doesn't really matter from a practical point of > view. Given your expressed preference for 'real problems' above, it's worth noting that in my quick scan through Roch's posts here I happened upon this (referring to performance issues using RAID-Z): "Now I have to find a way to justify myself with my head office that after spending 100k+ in hw and migrating to "the most advanced OS" we are running about 8 time slower :)" Some people might consider such a problem to be 'real' (and somewhat personal as well); he goes on to observe that "while that rsync process is running, ZONEX is completely unusable because of the rsync I/O load" - another 'real-world' indication of how excessive (and unnecessary) RAID-Z disk loading compromises other aspects of system performance (though limited scheduling intelligence may have contributed to this as well). Since I stumbled upon that without even looking for it or scanning more than a minute fraction of 1% of the posts here, there's an excellent possibility that considerably more such are lurking elsewhere in this forum (want to do some 'work home' and find out?). > > Then, in some environments RAID-Z2 (on JBOD) actually provides better > performance than RAID-5 (and HW R5 for that matter). And, opposite > to you, I'm not speculating but I've been working with such > environment (lot of concurrent writes which are more critical than > much less reads later). Don't confuse apples with oranges. As long as it can accumulate enough dirty data before it has to flush it to disk, COW with batch write-back can make *any* write strategy work well. So there's no need to accept the brain-damaged nature of RAID-Z's performance with small-to-medium-sized random accesses in order to obtain the good performance that you describe above: a good ZFS RAID-5-like implementation could do just as well for those workloads *plus* beat both conventional RAID-5 and RAID-Z at small-update workloads *plus* cremate RAID-Z in terms of throughput on small-to-medium read workloads. The main limitation of the straight-forward way to implement this is that it would only be easily applicable to multi-block files, because each stripe could contain data from only one file (so as to avoid an additional level of access indirection); of course, in principle you could stripe a file as small as four disk sectors (2 KB) across 4 disks plus one for parity, so this approach would be inapplicable only to *tiny* files - around the size that one might start considering embedding in their disk inode, given a design that allowed that flexibility. While small files may get a large share of the access load in some environments, in most environments they consume only a small proportion of the storage space, so just leaving them to be mirrored would probably be an eminently viable strategy - and exploring more interesting alternatives wouldn't be productive anyway until you've managed to understand the basic one. > So when you saying that RAID-Z is brain-damaging - well, it's > mostly positive experience of a lot of people with RAID-Z vs. your statement > without any > real-world backing. I just provided one example above from a participant in this forum (and it seems unlikely that it's the only one). Does that mean that I get to accuse you of not having "done your work home properly", because you were unaware of it? ... > cyg> And the way ZFS aparently dropped the ball on its alleged > cyg> elimination of any kind of 'volume management' by requiring that > cyg> users create explicit (and matched) aggregations of disks to > cyg> support mirroring and RAID-Z. > > # mkfile 128m f1 ; mkfile 128m f2 ; mkfile 256m f3 ; mkfile 256m f4 > # zpool create bill mirror /var/tmp/f1 /var/tmp/f2 mirror /var/tmp/f3 > /var/tmp/f4 > # zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > bill 373M 90K 373M 0% ONLINE - > # > # mkfile 128m f11 ; mkfile 256m f44 > # zpool destroy bill > # zpool create bill raidz /var/tmp/f11 /var/tmp/f1 /var/tmp/f2 raidz > /var/tmp/f3 /var/tmp/f4 /var/tmp/f44 > # zfs list > NAME USED AVAIL REFER MOUNTPOINT > bill 101K 715M 32.6K /bill > # > (2*128+2*256=768) - looks fine. > > If you are talking about a solution which enables user to mix > different disk sizes in the same mirror or RAID-5 group and while all > the time providing given protection allows you to utilize 100% of all > disk capacities.... well, what is that solution? Is it free? > Open source? Available on general purpose OS? Or commodity HW? > Available at all? :P I'm talking about what ZFS *could* have provided to make good on their claim that they had eliminated (or at least effectively hidden) volume-management: a *real* 'storage pool' that just accepted whatever disks you gave it and could be used transparently to provide whatever form of redundancy was desired on a per-file basis, with the ability to add or remove individual disks at will. No need to create separate pools for non-redundant data, mirrors, parity RAID, etc.: it would 'just work', in the manner that some people would like to claim ZFS already does (and to some degree perhaps it actually does, but not when it comes to redundant storage). And yes, across a very wide range of disk-size variations it's possible to utilize 100% of the capacity of each individual disk in such a pool using relatively simple distribution strategies - especially if you can perform very minor rearrangements to cover corner cases (though ZFS-style snapshots would hinder that, which is one of the reasons - defragmentation being another, and rebalancing across multiple nodes being a third - that I favor a different snapshot approach). I described this here well over a year ago, and Bill Moore said they had actually considered it but had shelved it for various reasons (none of which appeared insurmountable - but he may have been making different assumptions about how it could be implemented). > > > cyg> Now, if someone came up with any kind of credible rebuttal to > cyg> these assertions we could at least discuss it on technical > cyg> grounds. But (and again you should consider this significant) no > cyg> one has: all we have is well-reasoned analysis on the one hand > cyg> and some (often fairly obnoxious) fanboy babble on the other. If > cyg> you step back, make the effort required to *understand* that > cyg> analysis, and try to look at the situation objectively, which do you > find more credible? > > Most credible to me is actual user experience than some theoretical > burbling. That's usually the case with amateurs who have difficulty understanding in detail how the systems that they use work. But at least many of them have the sense not to argue interminably with people who have actually designed and built such systems and *do* understand them in (excruciating) detail. ... > cyg> ZFS has other deficiencies, but they're more fundamental choices > cyg> involving poor trade-offs and lack of vision than outright (and > cyg> easily rectifiable) flaws, so they could more justifiably be > cyg> termed 'judgment calls' and I haven't delved as deeply into them. > > And what they are? Once again, you've failed to do your 'work home' - since I've mentioned them here previously: 1. Implementation tied to a centralized server - scales only 'up', not 'out'. 2. Snapshot mechanism that makes reorganization expensive (including reorganization across nodes - so it's a scaling impediment as well as a performance trade-off). 3. Explicit pointer (indirect block) trees for large files rather than a flatter mechanism that avoids deep tree look-ups (with high-level data distribution handled algorithmically - which also helps avoid the need to update pointers in bulk when inter-node rebalancing operations occur and confines pointer updates on writes to the node that holds the data). 4. Trying to use block size to manage both access granularity and on-disk contiguity for performance (though background reorganization could help the latter and leave the former free to adjust just for access granularity - so that design choice could be considered one of the flaws already discussed above). There were probably more, but as you likely wouldn't understand them any better than you've understood anything else there's little point in dredging them up again. ... > cyg> But they're the main reason I have no interest in 'working on' > > Well, you're not using ZFS, you are not interested in working on it, > all you are interested is finding some potential corner cases bad for > ZFS and bashing it. If you put at least 10% of your energy you're > putting in your 'holy war' you would at least provide some benchmarks > (filebench?) showing these corner cases in comparison to other > mind-blowing solutions on the market which are much better than ZFS, > so we can all reproduce them and try to address ZFS problems. I really don't have much interest in meeting *your* criteria for being convinced, Robert - at least in part because it's not clear that *anything* would convince you. So it's more fun to see how completely committed people like you are to keeping their heads firmly wedged up where the sun don't shine to avoid actually facing up to the fact that ZFS just ain't quite what you thought it was. ... > cyg> You really haven't bothered to read much at all, have you. I've > cyg> said, multiple times, that I came here initially in the hope of > cyg> learning something interesting. More recently, I came here > cyg> because I offered a more balanced assessment of ZFS's strengths > cyg> and weaknesses in responding to the Yager article and wanted to > cyg> be sure that I had not treated ZFS unfairly in some way - which > cyg> started this extended interchange. After that, I explained that > cyg> while the likelihood of learning anything technical here was > cyg> looking pretty poor, I didn't particularly like some of the > cyg> personal attacks that I'd been subject to and had decided to confront > them. > > Well, every time I saw it was you 'attacking' other people first. Then you obviously missed a great many posts, but given the readily-apparent quality of your other research I don't find that surprising at all. ... > If you are not contributing here, and you are not learning here - wy > are you here? I'm serious - why? I explained that, in detail, in my previous post. Given the expressed 'seriousness' of your repeat question here I was going to ask whether you are functionally illiterate, but your advice below brought up an another possibility. ... > cyg> No, my attitude is that people too stupid and/or too lazy to > cyg> understand what I *have* been delivering don't deserve much respect if > they complain. > > Maybe you should thing about that "stupid" part... As usual, I thought about it *before* I said it. However, I did inadvertently omit a third possibility - that people such as you (who don't quite strike me as being abjectly stupid or drop-dead lazy) are instead simply too intellectually dishonest (whether intentionally or so habitually that it has become subconscious) to understand what I've been 'delivering'. So you're right: there's always room to refine one's understanding, and another relevant quotation comes to mind ("There are none so blind as those who will not see"). > Maybe, just maybe, it's possible that all people around you don't > understand you, that world is wrong and we're all so stupid. Well, > maybe. Even if it is so, then perhaps it's time to stop being Don Quixote > and move on? No, but it might be getting close to it - I'll let you know. - bill This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss