Re: [Jfs-discussion] benchmark results
When things didn't match up that was a clue that either - the benchmark was broken - the code was broken [...] I would carry out an object-oriented dualism here. [1] methods (kernel module) [2] objects (formatted partition) || || [3] benchmarks - [4] user-space utilities (fsck) User-space utilities investigate "object corruptions", whereas benchmarks investigate "software corruptions" (including bugs in source code, broken design, etc, etc..) It is clear that "software" can be "corrupted" by a larger number of ways than "objects". Indeed, it is known that dual space V* (of all linear functions over V) is a much more complex object than V. So benchmark is a process which takes a set of methods (we consider only "software" benchmarks) and puts numerical values populated with a special (the worst) value CRASH. Three main categories of benchmarks using: 1) Internal testing An engineer makes optimizations in a file system (e.g. for a customer) via choosing functions or plugins as winners in a set of internal (local) "nominations". 2) Business plans A system administrator chooses a "winner" in some (global) "nomination" of file systems in accordance with internal business-plans. 3) Flame and politics Someone presents a "nomination" (usually with the "winner" among restricted number of nominated members) to the public while nobody asked him to do it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Sun, Jan 10, 2010 at 08:03:04PM -0500, Casey Allen Shobe wrote: > On Dec 25, 2009, at 11:22 AM, Larry McVoy wrote: >> Dudes, sync() doesn't flush the fs cache, you have to unmount for >> that. >> Once upon a time Linux had an ioctl() to flush the fs buffers, I used >> it in lmbench. > > > You do not need to unmount - 2.6.16+ have a mechanism in /proc to flush > caches. See http://linux-mm.org/Drop_Caches Cool, but I tend to come at problems from a cross platform point of view. Aix no hable /proc :) -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Dec 25, 2009, at 11:22 AM, Larry McVoy wrote: Dudes, sync() doesn't flush the fs cache, you have to unmount for that. Once upon a time Linux had an ioctl() to flush the fs buffers, I used it in lmbench. You do not need to unmount - 2.6.16+ have a mechanism in /proc to flush caches. See http://linux-mm.org/Drop_Caches Cheers, -- Casey Allen Shobe [email protected] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
Dave Chinner wrote: On Mon, Jan 04, 2010 at 11:27:48AM -0500, Chris Mason wrote: On Fri, Dec 25, 2009 at 11:11:46AM -0500, [email protected] wrote: On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote: [1] http://samba.org/ftp/tridge/dbench/README Was not able to resist to write a small notice, what no matter what, but whatever benchmark is running, it _does_ show system behaviour in one or another condition. And when system behaves rather badly, it is quite a common comment, that benchmark was useless. But it did show that system has a problem, even if rarely triggered one :) If people are using benchmarks to improve file system, and a benchmark shows a problem, then trying to remedy the performance issue is a good thing to do, of course. Sometimes, though the case which is demonstrated by a poor benchmark is an extremely rare corner case that doesn't accurately reflect common real-life workloads --- and if addressing it results in a tradeoff which degrades much more common real-life situations, then that would be a bad thing. In situations where benchmarks are used competitively, it's rare that it's actually a *problem*. Instead it's much more common that a developer is trying to prove that their file system is *better* to gullible users who think that a single one-dimentional number is enough for them to chose file system X over file system Y. [ Look at all this email from my vacation...sorry for the delay ] It's important that people take benchmarks from filesystem developers with a big grain of salt, which is one reason the boxacle.net results are so nice. Steve more than willing to take patches and experiment to improve a given FS results, but his business is a fair representation of performance and it shows. Just looking at the results there, I notice that the RAID system XFS mailserver results dropped by an order of magnitude between 2.6.29-rc2 and 2.6.31. The single disk results are pretty much identical across the two kernels. IIRC, in 2.6.31 RAID0 started passing barriers through so I suspect this is the issue. However, seeing as dmesg is not collected by the scripts after the run and the output of the mounttab does not show default options, I cannot tell if this is the case. Well the dmesg collection is done by the actual benchmark run which occurs after the mount command is issued, so if you are looking for dmesg related to mounting the xfs volume, it should be in the dmesg we did collect. If dmesg actually formatted timestamps, this would be easier to see. It seems that nothing from xfs is ending up in dmesg since we are running xfs with different threads counts in order without reboot, so the dmesg for 16 thread xfs is run right after 1 thread xfs, but the dmesg show ext3 as the last thing, so safe to say no output from xfs is ending up in dmesg at all. This might be worth checking by running XFS with the "nobarrier" mount option I could give that a try for you. FWIW, is it possible to get these benchmarks run on each filesystem for each kernel release so ext/xfs/btrfs all get some regular basic performance regression test coverage? Possible yes. Just need to find the time to do the runs, and more importantly postprocess the data in some meaningful way. I'll see what I can do. Steve Cheers, Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Mon, Jan 04, 2010 at 11:27:48AM -0500, Chris Mason wrote: > On Fri, Dec 25, 2009 at 11:11:46AM -0500, [email protected] wrote: > > On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote: > > > > [1] http://samba.org/ftp/tridge/dbench/README > > > > > > Was not able to resist to write a small notice, what no matter what, but > > > whatever benchmark is running, it _does_ show system behaviour in one > > > or another condition. And when system behaves rather badly, it is quite > > > a common comment, that benchmark was useless. But it did show that > > > system has a problem, even if rarely triggered one :) > > > > If people are using benchmarks to improve file system, and a benchmark > > shows a problem, then trying to remedy the performance issue is a good > > thing to do, of course. Sometimes, though the case which is > > demonstrated by a poor benchmark is an extremely rare corner case that > > doesn't accurately reflect common real-life workloads --- and if > > addressing it results in a tradeoff which degrades much more common > > real-life situations, then that would be a bad thing. > > > > In situations where benchmarks are used competitively, it's rare that > > it's actually a *problem*. Instead it's much more common that a > > developer is trying to prove that their file system is *better* to > > gullible users who think that a single one-dimentional number is > > enough for them to chose file system X over file system Y. > > [ Look at all this email from my vacation...sorry for the delay ] > > It's important that people take benchmarks from filesystem developers > with a big grain of salt, which is one reason the boxacle.net results > are so nice. Steve more than willing to take patches and experiment to > improve a given FS results, but his business is a fair representation of > performance and it shows. Just looking at the results there, I notice that the RAID system XFS mailserver results dropped by an order of magnitude between 2.6.29-rc2 and 2.6.31. The single disk results are pretty much identical across the two kernels. IIRC, in 2.6.31 RAID0 started passing barriers through so I suspect this is the issue. However, seeing as dmesg is not collected by the scripts after the run and the output of the mounttab does not show default options, I cannot tell if this is the case. This might be worth checking by running XFS with the "nobarrier" mount option FWIW, is it possible to get these benchmarks run on each filesystem for each kernel release so ext/xfs/btrfs all get some regular basic performance regression test coverage? Cheers, Dave. -- Dave Chinner [email protected] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
Google is currently in the middle of upgrading from ext2 to a more up to date file system. We ended up choosing ext4. This thread touches upon many of the issues we wrestled with, so I thought it would be interesting to share. We should be sending out more details soon. The driving performance reason to upgrade is that while ext2 had been "good enough" for a very long time the metadata arrangement on a stale file system was leading to what we call "read inflation". This is where we end up doing many seeks to read one block of data. In general latency from poor block allocation was causing performance hiccups. We spent a lot of time with unix standard benchmarks (dbench, compile bench, et al) on xfs, ext4, jfs to try to see which one was going to perform the best. In the end we mostly ended up using the benchmarks to validate our assumptions and do functional testing. Larry is completely right IMHO. These benchmarks were instrumental in helping us understand how the file systems worked in controlled situations and gain confidence from our customers. For our workloads we saw ext4 and xfs as "close enough" in performance in the areas we cared about. The fact that we had a much smoother upgrade path with ext4 clinched the deal. The only upgrade option we have is online. ext4 is already moving the bottleneck away from the storage stack for some of our most intensive applications. It was not until we moved from benchmarks to customer workload that we were able to make detailed performance comparisons and find bugs in our implementation. "Iterate often" seems to be the winning strategy for SW dev. But when it involves rebooting a cloud of systems and making a one way conversion of their data it can get messy. That said I see benchmarks as tools to build confidence before running traffic on redundant live systems. mrubin PS for some reason "dbench" holds mythical power over many folks I have met. They just believe it's the most trusted and standard benchmark for file systems. In my experience it often acts as a random number generator. It has found some bugs in our code as it exercises the VFS layer very well. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, Dec 25, 2009 at 11:11:46AM -0500, [email protected] wrote: > On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote: > > > [1] http://samba.org/ftp/tridge/dbench/README > > > > Was not able to resist to write a small notice, what no matter what, but > > whatever benchmark is running, it _does_ show system behaviour in one > > or another condition. And when system behaves rather badly, it is quite > > a common comment, that benchmark was useless. But it did show that > > system has a problem, even if rarely triggered one :) > > If people are using benchmarks to improve file system, and a benchmark > shows a problem, then trying to remedy the performance issue is a good > thing to do, of course. Sometimes, though the case which is > demonstrated by a poor benchmark is an extremely rare corner case that > doesn't accurately reflect common real-life workloads --- and if > addressing it results in a tradeoff which degrades much more common > real-life situations, then that would be a bad thing. > > In situations where benchmarks are used competitively, it's rare that > it's actually a *problem*. Instead it's much more common that a > developer is trying to prove that their file system is *better* to > gullible users who think that a single one-dimentional number is > enough for them to chose file system X over file system Y. [ Look at all this email from my vacation...sorry for the delay ] It's important that people take benchmarks from filesystem developers with a big grain of salt, which is one reason the boxacle.net results are so nice. Steve more than willing to take patches and experiment to improve a given FS results, but his business is a fair representation of performance and it shows. > > For example, if I wanted to play that game and tell people that ext4 > is better, I'd might pick this graph: > > http://btrfs.boxacle.net/repository/single-disk/2.6.29-rc2/2.6.29-rc2/2.6.29-rc2_Mail_server_simulation._num_threads=32.html > > On the other hand, this one shows ext4 as the worst compared to all > other file systems: > > http://btrfs.boxacle.net/repository/single-disk/2.6.29-rc2/2.6.29-rc2/2.6.29-rc2_Large_file_random_writes_odirect._num_threads=8.html > > Benchmarking, like statistics, can be extremely deceptive, and if > people do things like carefully order a tar file so the files are > optimal for a file system, it's fair to ask whether that's a common > thing for people to be doing (either unpacking tarballs or unpacking > tarballs whose files have been carefully ordered for a particular file > systems). I tend to use compilebench for testing the ability to create lots of small files, which puts the file names into FS native order (by unpacking and then readdiring the results) before it does any timings. I'd agree with Larry that benchmarking is most useful to test a theory. Here's a patch that is supposed to do xyz, is that actually true. With that said we should also be trying to write benchmarks that show the worst case...we know some of our design weakness and should be able to show numbers for how bad it really is (see the random write btrfs.boxacle.net tests for that one). -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
> The bottom line is that it's very hard to do good comparisons that are > useful in the general case. It has always amazed me watching people go about benchmarking. I should have a blog called "you're doing it wrong" or something. Personally, I use benchmarks to validate what I already believe to be true. So before I start I have a predicition as to what the answer should be, based on my understanding of the system being measured. Back when I was doing this a lot, I was always within a factor of 10 (not a big deal) and usually within a factor of 2 (quite a bit bigger deal). When things didn't match up that was a clue that either - the benchmark was broken - the code was broken - the hardware was broken - my understanding was broken If you start a benchmark and you don't know what the answer should be, at the very least within a factor of 10 and ideally within a factor of 2, you shouldn't be running the benchmark. Well, maybe you should, they are fun. But you sure as heck shouldn't be publishing results unless you know they are correct. This is why lmbench, to toot my own horn, measures what it does. If go run that, memorize the results, you can tell yourself "well, this machine has sustained memory copy bandwidth of 3.2GB/sec, the disk I'm using can read at 60MB/sec and write at 52MB/sec (on the outer zone where I'm going to run my tests), it does small seeks in about 6 milliseconds, I'm doing sequential I/O, the bcopy is in the noise, the blocks are big enough that the seeks are hidden, so I'd like to see a steady 50MB/sec or so on a sustained copy test". If you have a mental model for how the bits of the system works you can decompose the benchmark into the parts, predict the result, run it, and compare. It'll match or Lucy, you have some 'splainin to do. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Sun, 27 Dec 2009 at 17:33, [email protected] wrote: > Yes, but given many of the file systems have almost *exactly* the same "Almost" indeed - but curiously enough some filesystem are *not* the same, although they should. Again: we have 8GB RAM, I'm copying ~3GB of data, so why _are_ there differences? (Answer: because filesystems are different). That's the only point of this test. Also note the disclaimer[0] I added to the results page a few days ago. > measurement is 5 times the disk bandwidith as measured by hdparm, it > makes me suspect that you are doing this: > /bin/time /bin/cp -r /source/tree /filesystem-under-test > sync No, I'm not - see the test script[1] - I'm taking the time for cp/rm/tar *and* sync. But even if I would only take the time *only* for say "cp", not the sync part. Still, it would be a valid comparison across filesystems (the same operation for every filesystem) also a not very realistic one - because in the real world I *want* to make sure my data is on the disk. But that's as far as I go in these tests, I'm not even messing around with disk caches or HBA caches - that's not the scope of these tests. > You might notice it if you include the "sync" in the timing, i.e.: > /bin/time /bin/sh -c "/bin/cp -r /source/tree > /filesystem-under-test;/bin/sync" Yes, that's exactly what the tests do. > "/bin/cp" returns, then sure, do whatever you want. But if you want > the tests to have meaning if, for example, you have 2GB of memory and > you are copying 8GB of data, For the bonnie++ tests I chose a filesize (16GB) so that disk performance will matter here. As the generic tests shuffle around much more smaller data, no disk performance, but filesystem performance is measured (and compared to other filesystems) - well aware of the fact that caches *Are* being used. Why would I want to discard caches? My daily usage pattern (opening webrowsers, terminal windows, spreadcheats deal with much smaller datasets and I'm happy that Linux is so hungry for cache - yet some filesystems do not seem to utilize this opportunity as good as others do. That's the whole point of this particular test. But constantly explaining my point over and over again I see what I have to do: I shall run the generic tests again with much bigger datasets, so that disk-performance is also reflected, as people do seem to care about this (I don't - I can switch filesystems more easily than disks). > The bottom line is that it's very hard to do good comparisons that are > useful in the general case. And it's difficult to find out what's a "useful comparison" for the general public :-) Christian. [0] http://nerdbynature.de/benchmarks/v40z/2009-12-22/ [1] http://nerdbynature.de/benchmarks/v40z/2009-12-22/env/fs-bench.sh.txt -- BOFH excuse #292: We ran out of dial tone and we're and waiting for the phone company to deliver another bottle. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Sun, Dec 27, 2009 at 01:55:26PM -0800, Christian Kujau wrote: > On Sun, 27 Dec 2009 at 14:50, jim owens wrote: > > And I don't even care about comparing 2 filesystems, I only care about > > timing 2 versions of code in the single filesystem I am working on, > > and forgetting about hardware cache effects has screwed me there. > > Not me, I'm comparing filesystems - and when the HBA or whatever plays > tricks and "sync" doesn't flush all the data, it'll do so for every tested > filesystem. Of course, filesystem could handle "sync" differently, and > they probably do, hence the different times they take to complete. That's > what my tests are about: timing comparision (does that still fall under > the "benchmark" category?), not functional comparision. That's left as a > task for the reader of these results: "hm, filesystem xy is so much faster > when doing foo, why is that? And am I willing to sacrifice e.g. proper > syncs to gain more speed?" Yes, but given many of the file systems have almost *exactly* the same bandwidth measurement for the "cp" test, and said bandwidth measurement is 5 times the disk bandwidith as measured by hdparm, it makes me suspect that you are doing this: /bin/time /bin/cp -r /source/tree /filesystem-under-test sync /bin/time /bin/rm -rf /filesystem-under-test/tree sync etc. It is *a* measurement, but the question is whether it's a useful comparison. Consider two different file systems. One file system which does a very good job making sure that file writes are done contiguously to disk, minimizing seek overhead --- and another file system which is really crappy at disk allocation, and writes the files to random locations all over the disk. If you are only measuring the "cp", then the fact that filesystem 'A' has a very good layout, and is able to write things to disk very efficiently, and filesystem 'B' has files written in a really horrible way, won't be measured by your test. This is especially true if, for example, you have 8GB of memory and you are copying 4GB worth of data. You might notice it if you include the "sync" in the timing, i.e.: /bin/time /bin/sh -c "/bin/cp -r /source/tree /filesystem-under-test;/bin/sync" > Again, I don't argue with "hardware caches will have effects", but that's > not the point of these tests. Of course hardware is different, but > filesystems are too and I'm testing filesystems (on the same hardware). The question is whether your tests are doing the best job of measuring how good the filesystem really is. If your workload is one where you will only be copying file sets much smaller than your memory, and you don't care about when the data actually hits the disk, only when "/bin/cp" returns, then sure, do whatever you want. But if you want the tests to have meaning if, for example, you have 2GB of memory and you are copying 8GB of data, or if later on will be continuously streaming data to the disk, and sooner or later the need to write data to the disk will start slowing down your real-life workload, then not including the time to do the sync in the time to copy your file set may cause you to assume that filesystems 'A' and 'B' are identical in performance, and then your filesystem comparison will end up misleading you. The bottom line is that it's very hard to do good comparisons that are useful in the general case. Best regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Sun, 27 Dec 2009 at 14:50, jim owens wrote: > And I don't even care about comparing 2 filesystems, I only care about > timing 2 versions of code in the single filesystem I am working on, > and forgetting about hardware cache effects has screwed me there. Not me, I'm comparing filesystems - and when the HBA or whatever plays tricks and "sync" doesn't flush all the data, it'll do so for every tested filesystem. Of course, filesystem could handle "sync" differently, and they probably do, hence the different times they take to complete. That's what my tests are about: timing comparision (does that still fall under the "benchmark" category?), not functional comparision. That's left as a task for the reader of these results: "hm, filesystem xy is so much faster when doing foo, why is that? And am I willing to sacrifice e.g. proper syncs to gain more speed?" > So unless you are sure you have no hardware cache effects... > "the comparison still stands" is *false*. Again, I don't argue with "hardware caches will have effects", but that's not the point of these tests. Of course hardware is different, but filesystems are too and I'm testing filesystems (on the same hardware). Christian. -- BOFH excuse #278: The Dilithium Crystals need to be rotated. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
Christian Kujau wrote: > On 26.12.09 08:00, jim owens wrote: >>> I was using "sync" to make sure that the data "should" be on the disks >> Good, but not good enough for many tests... info sync > [...] >>On Linux, sync is only guaranteed to schedule the dirty blocks >> for >>writing; it can actually take a short time before all the blocks >> are >>finally written. OK, that was wrong per Ted's explanation: > > But for quite some time, under Linux the sync(2) system call will wait > for the blocks to be flushed out to HBA, although we currently don't > wait for the blocks to have been committed to the platters (at least > not for all file systems). But Christian Kujau wrote: > Noted, many times already. That's why I wrote "should be" - but in this > special scenario (filesystem speed tests) I don't care for file > integrity: if I pull the plug after "sync" and some data didn't make it > to the disks, I'll only look if the testscript got all the timestamps > and move on to the next test. I'm not testing for "filesystem integrity > after someone pulls the plug" here. And remember, I'm doing "sync" for > all the filesystems tested, so the comparison still stands. You did not understand my point. It was not about data integrity, it was about test timing validity. And even with sync(2) behaving as Ted describes, *timing* may still tell you the wrong thing or not tell you something important. I have a battery-backed HBA cache. Writes are HBA cached. Timing only shows "to HBA memory". So 1000 pages (4MB total) that are at 1000 places on the disk will time (almost) the same completion as 1000 pages that are in 200 extents of 50 pages each. Writing to disk the time difference between these would be an obvious slap upside the head. Hardware caches can trick you into thinking a filesystem performs much better than it really does for some operations. Or trick you about relative performance between 2 filesystems. And I don't even care about comparing 2 filesystems, I only care about timing 2 versions of code in the single filesystem I am working on, and forgetting about hardware cache effects has screwed me there. So unless you are sure you have no hardware cache effects... "the comparison still stands" is *false*. jim -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Sat, Dec 26, 2009 at 11:00:59AM -0500, jim owens wrote:
> Christian Kujau wrote:
>
> > I was using "sync" to make sure that the data "should" be on the disks
>
> Good, but not good enough for many tests... info sync
>
> CONFORMING TO
>POSIX.2
>
> NOTES
>On Linux, sync is only guaranteed to schedule the dirty blocks for
>writing; it can actually take a short time before all the blocks are
>finally written.
>
> This is consistent with all the feels-like-unix OSes I have used.
Actually, Linux's sync does more than just schedule the writes; it has
for quite some time:
static void sync_filesystems(int wait)
{
...
}
SYSCALL_DEFINE0(sync)
{
wakeup_flusher_threads(0);
sync_filesystems(0);
sync_filesystems(1);
if (unlikely(laptop_mode))
laptop_sync_completion();
return 0;
}
At least for ext3 and ext4, we will even do a device barrier operation
as a restult of a call to sb->s_op->sync_fs() --- which is called by
__sync_filesystem, which is called in turn by sync_filesystems().
This isn't done for all file systems, though, as near as I can tell.
(Ext2 at least doesn't.)
But for quite some time, under Linux the sync(2) system call will wait
for the blocks to be flushed out to HBA, although we currently don't
wait for the blocks to have been committed to the platters (at least
not for all file systems).
Applications shouldn't depend on this, of course, since POSIX and
other legacy Unix systems don't guarantee this. But in terms of
knowing what Linux does, the man page is a bit out of date.
Best regards,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On 26.12.09 08:00, jim owens wrote: >> I was using "sync" to make sure that the data "should" be on the disks > > Good, but not good enough for many tests... info sync [...] >On Linux, sync is only guaranteed to schedule the dirty blocks for >writing; it can actually take a short time before all the blocks are >finally written. Noted, many times already. That's why I wrote "should be" - but in this special scenario (filesystem speed tests) I don't care for file integrity: if I pull the plug after "sync" and some data didn't make it to the disks, I'll only look if the testscript got all the timestamps and move on to the next test. I'm not testing for "filesystem integrity after someone pulls the plug" here. And remember, I'm doing "sync" for all the filesystems tested, so the comparison still stands. Christian. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
Christian Kujau wrote: > I was using "sync" to make sure that the data "should" be on the disks Good, but not good enough for many tests... info sync CONFORMING TO POSIX.2 NOTES On Linux, sync is only guaranteed to schedule the dirty blocks for writing; it can actually take a short time before all the blocks are finally written. This is consistent with all the feels-like-unix OSes I have used. And to make it even more random, the hardware (drive/controller) write cache state needs to be accounted for, and what the filesystem does if anything to try to ensure device-cache-to-media consistency. That does not mean I'm saying the tests are invalid or not useful, only that people need to evaluate "what do they really tell me". jim -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, 25 Dec 2009 at 10:56, Christian Kujau wrote: > Thanks for the hint, I could find sys/vm/drop-caches documented in --^ not, was what I meant to say, but it's all there, as "drop_caches" in Documentation/sysctl/vm.txt Christian. > Documentation/ but it's good to know there's a way to flush all these > caces via this knob. Maybe I should add this to those "genric" tests to be > more comparable to the other benchmarks. -- BOFH excuse #129: The ring needs another token -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, 25 Dec 2009 at 11:33, [email protected] wrote: > caches, though; if you are going to measure read as well as writes, > then you'll probably want to do something like "echo 3 > > /proc/sys/vm/drop-caches". Thanks for the hint, I could find sys/vm/drop-caches documented in Documentation/ but it's good to know there's a way to flush all these caces via this knob. Maybe I should add this to those "genric" tests to be more comparable to the other benchmarks. Christian. -- BOFH excuse #210: We didn't pay the Internet bill and it's been cut off. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, 25 Dec 2009 at 08:22, Larry McVoy wrote: > Dudes, sync() doesn't flush the fs cache, you have to unmount for that. Thanks Larry, that was exactly my point[0] too, I should add that to the results page to avoid further confusion or misassumptions: > Well, I do "sync" after each operation, so the data should be on > disk, but that doesn't mean it'll clear the filesystem buffers > - but this doesn't happen that often in the real world too. I realize however that on the same results page the bonnie++ tests were run with a filesize *specifically* set to not utilize the filesystem buffers any more but the measure *disk* performance while my "generic* tests do something else - and thus cannot be compared to the bonnie++ or hdparm results. > No idea if that is still supported, but sync() is a joke for benchmarking. I was using "sync" to make sure that the data "should" be on the disks now, I did not want to flush the filesystem buffers during the "generic" tests. Thanks, Christian. [0] http://www.spinics.net/lists/linux-ext4/msg16878.html -- BOFH excuse #210: We didn't pay the Internet bill and it's been cut off. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, 25 Dec 2009 at 11:14, [email protected] wrote: > Did you include the "sync" in part of what you timed? In my "generic" tests[0] I do "sync" after each of the cp/tar/rm operations. > Peter was quite > right --- the fact that the measured bandwidth in your "cp" test is > five times faster than the disk bandwidth as measured by hdparm, and > many file systems had exactly the same bandwidth, makes me very > suspicious that what was being measured was primarily memory bandwidth That's right, and that's what I replied to Peter on jfs-discussion[1]: >> * In the "generic" test the 'tar' test bandwidth is exactly the >> same ("276.68 MB/s") for nearly all filesystems. True, because I'm tarring up ~2.7GB of content while the box is equipped with 8GB of RAM. So it *should* be the same for all filesystems, as Linux could easily hold all this in its caches. Still, jfs and zfs manage to be slower than the rest. > --- and not very useful when trying to measure file system > performance. For the bonnie++ tests I chose an explicit filesize of 16GB, two times the size of the machine's RAM to make sure it will tests the *disks* performance. And to be consistent across one benchmark run, I should have copied/tarred/removed 16GB as well. However, I figured not to do that - but to *use* the filesystem buffers instead of ignoring them. After all, it's not about disk performace (that's what hdparm could be for) but filesystem performance (or comparision, more exactly) - and I'm not exited about the fact, that almost all filesystems are copying with ~276MB/s but I'm wondering why zfs is 13 times slower when copying data or xfs takes 200 seconds longer than other filesystems, while it's handling the same size as all the others. So no, please don't compare the bonnie++ results against my "generic" results withing these results - as they're (obviously, I thought) taken with different parameters/content sizes. Christian. [0] http://nerdbynature.de/benchmarks/v40z/2009-12-22/env/fs-bench.sh.txt [1] http://tinyurl.com/yz6x2sj -- BOFH excuse #85: Windows 95 undocumented "feature" -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, Dec 25, 2009 at 11:14:53AM -0500, [email protected] wrote: > On Thu, Dec 24, 2009 at 05:52:34PM -0800, Christian Kujau wrote: > > > > Well, I do "sync" after each operation, so the data should be on disk, but > > that doesn't mean it'll clear the filesystem buffers - but this doesn't > > happen that often in the real world too. Also, all filesystem were tested > > equally (I hope), yet some filesystem perform better than another - even > > if all the content copied/tar'ed/removed would perfectly well fit into the > > machines RAM. > > Did you include the "sync" in part of what you timed? Peter was quite > right --- the fact that the measured bandwidth in your "cp" test is > five times faster than the disk bandwidth as measured by hdparm, and > many file systems had exactly the same bandwidth, makes me very > suspicious that what was being measured was primarily memory bandwidth > --- and not very useful when trying to measure file system > performance. Dudes, sync() doesn't flush the fs cache, you have to unmount for that. Once upon a time Linux had an ioctl() to flush the fs buffers, I used it in lmbench. ioctl(fd, BLKFLSBUF, 0); No idea if that is still supported, but sync() is a joke for benchmarking. -- --- Larry McVoylm at bitmover.com http://www.bitkeeper.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, Dec 25, 2009 at 08:22:38AM -0800, Larry McVoy wrote: > > Dudes, sync() doesn't flush the fs cache, you have to unmount for that. > Once upon a time Linux had an ioctl() to flush the fs buffers, I used > it in lmbench. > > ioctl(fd, BLKFLSBUF, 0); > > No idea if that is still supported, but sync() is a joke for benchmarking. Depends on what you are trying to do (flush has multiple meanings, so using can be ambiguous). BLKFLSBUF will write out any dirty buffers, *and* empty the buffer cache. I use it when benchmarking e2fsck optimization. It doesn't do anything for the page cache. If you are measuring the time to write a file, using fsync() or sync() will include the time to actually write the data to disk. It won't empty caches, though; if you are going to measure read as well as writes, then you'll probably want to do something like "echo 3 > /proc/sys/vm/drop-caches". - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Thu, Dec 24, 2009 at 05:52:34PM -0800, Christian Kujau wrote: > > Well, I do "sync" after each operation, so the data should be on disk, but > that doesn't mean it'll clear the filesystem buffers - but this doesn't > happen that often in the real world too. Also, all filesystem were tested > equally (I hope), yet some filesystem perform better than another - even > if all the content copied/tar'ed/removed would perfectly well fit into the > machines RAM. Did you include the "sync" in part of what you timed? Peter was quite right --- the fact that the measured bandwidth in your "cp" test is five times faster than the disk bandwidth as measured by hdparm, and many file systems had exactly the same bandwidth, makes me very suspicious that what was being measured was primarily memory bandwidth --- and not very useful when trying to measure file system performance. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote: > > [1] http://samba.org/ftp/tridge/dbench/README > > Was not able to resist to write a small notice, what no matter what, but > whatever benchmark is running, it _does_ show system behaviour in one > or another condition. And when system behaves rather badly, it is quite > a common comment, that benchmark was useless. But it did show that > system has a problem, even if rarely triggered one :) If people are using benchmarks to improve file system, and a benchmark shows a problem, then trying to remedy the performance issue is a good thing to do, of course. Sometimes, though the case which is demonstrated by a poor benchmark is an extremely rare corner case that doesn't accurately reflect common real-life workloads --- and if addressing it results in a tradeoff which degrades much more common real-life situations, then that would be a bad thing. In situations where benchmarks are used competitively, it's rare that it's actually a *problem*. Instead it's much more common that a developer is trying to prove that their file system is *better* to gullible users who think that a single one-dimentional number is enough for them to chose file system X over file system Y. For example, if I wanted to play that game and tell people that ext4 is better, I'd might pick this graph: http://btrfs.boxacle.net/repository/single-disk/2.6.29-rc2/2.6.29-rc2/2.6.29-rc2_Mail_server_simulation._num_threads=32.html On the other hand, this one shows ext4 as the worst compared to all other file systems: http://btrfs.boxacle.net/repository/single-disk/2.6.29-rc2/2.6.29-rc2/2.6.29-rc2_Large_file_random_writes_odirect._num_threads=8.html Benchmarking, like statistics, can be extremely deceptive, and if people do things like carefully order a tar file so the files are optimal for a file system, it's fair to ask whether that's a common thing for people to be doing (either unpacking tarballs or unpacking tarballs whose files have been carefully ordered for a particular file systems). When it's the only number used by a file system developer when trying to convince users they should use their file system, at least in my humble opinion it becomes murderously dishonest. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
I'm a file system testing newbie, I have a question/doubt,please let me know if i'm wrong. Do you think a tool, which uses output from "hdparm" command,to get hard drives maximum performance and compares it specific file system (say for example,"ext4 provides xx throughput against max. device throughput yy" ) would be more meaningful. Does using hdparm (or other device throughput related tools) for benchmarking will be useful? Thanks. -- Cheers, Lakshmipathi.G www.giis.co.in -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Thu, 24 Dec 2009 at 16:27, [email protected] wrote: > If you don't do a "sync" after the tar, then in most cases you will be > measuring the memory bandwidth, because data won't have been written Well, I do "sync" after each operation, so the data should be on disk, but that doesn't mean it'll clear the filesystem buffers - but this doesn't happen that often in the real world too. Also, all filesystem were tested equally (I hope), yet some filesystem perform better than another - even if all the content copied/tar'ed/removed would perfectly well fit into the machines RAM. > Another good example of well done file system benchmarks can be found > at http://btrfs.boxacle.net Thanks, I'll have a look at it and perhaps even integrate it in the wrapper script. > benchmarks for a living. Note that JFS and XFS come off much better > on a number of the tests Indeed, I was surpised to see JFS perform that good and XFS of course is one of the best too - I just wanted to point out that both of them are strangely slow at times (removing or creating many files) - not what I expected. > --- and that there is a *large* number amount > of variation when you look at different simulated workloads and with a > varying number of threads writing to the file system at the same time. True, the TODO list in the script ("different benchmark options") is in there for a reason :-) Christian. -- BOFH excuse #291: Due to the CDA, we no longer have a root account. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
Hi Ted. On Thu, Dec 24, 2009 at 04:27:56PM -0500, [email protected] ([email protected]) wrote: > > Unfortunately there seems to be an overproduction of rather > > meaningless file system "benchmarks"... > > One of the problems is that very few people are interested in writing > or maintaining file system benchmarks, except for file system > developers --- but many of them are more interested in developing (and > unfortunately, in some cases, promoting) their file systems than they > are in doing a good job maintaining a good set of benchmarks. Sad but > true... H I suppose here should be a link to such set? :) No link? Than I suppose benchmark results are pretty much in sync with what they are supposed to show. > > * In the "generic" test the 'tar' test bandwidth is exactly the > > same ("276.68 MB/s") for nearly all filesystems. > > > > * There are read transfer rates higher than the one reported by > > 'hdparm' which is "66.23 MB/sec" (comically enough *all* the > > read transfer rates your "benchmarks" report are higher). > > If you don't do a "sync" after the tar, then in most cases you will be > measuring the memory bandwidth, because data won't have been written > to disk. Worse yet, it tends to skew the results of the what happens > afterwards (*especially* if you aren't running the steps of the > benchmark in a script). It depends on the size of untarred object, for linux kernel tarball and common several gigs of RAM it is very valid not to run a sync after the tar, since writeback will take care about it. > > BTW the use of Bonnie++ is also usually a symptom of a poor > > misunderstanding of file system benchmarking. > > Dbench is also a really nasty benchmark. If it's tuned correctly, you > are measuring memory bandwidth and the hard drive light will never go > on. :-) The main reason why it was interesting was that it and tbench > was used to model a really bad industry benchmark, netbench, which at > one point a number of years ago I/T managers used to decide which CIFS > server they would buy[1]. So it was useful for Samba developers who were > trying to do competitive benchmkars, but it's not a very accurate > benchmark for measuring real-life file system workloads. > > [1] http://samba.org/ftp/tridge/dbench/README Was not able to resist to write a small notice, what no matter what, but whatever benchmark is running, it _does_ show system behaviour in one or another condition. And when system behaves rather badly, it is quite a common comment, that benchmark was useless. But it did show that system has a problem, even if rarely triggered one :) Not an ext4 nitpick of course. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
On Thu, Dec 24, 2009 at 01:05:39PM +, Peter Grandi wrote:
> > I've had the chance to use a testsystem here and couldn't
> > resist
>
> Unfortunately there seems to be an overproduction of rather
> meaningless file system "benchmarks"...
One of the problems is that very few people are interested in writing
or maintaining file system benchmarks, except for file system
developers --- but many of them are more interested in developing (and
unfortunately, in some cases, promoting) their file systems than they
are in doing a good job maintaining a good set of benchmarks. Sad but
true...
> * In the "generic" test the 'tar' test bandwidth is exactly the
> same ("276.68 MB/s") for nearly all filesystems.
>
> * There are read transfer rates higher than the one reported by
> 'hdparm' which is "66.23 MB/sec" (comically enough *all* the
> read transfer rates your "benchmarks" report are higher).
If you don't do a "sync" after the tar, then in most cases you will be
measuring the memory bandwidth, because data won't have been written
to disk. Worse yet, it tends to skew the results of the what happens
afterwards (*especially* if you aren't running the steps of the
benchmark in a script).
> BTW the use of Bonnie++ is also usually a symptom of a poor
> misunderstanding of file system benchmarking.
Dbench is also a really nasty benchmark. If it's tuned correctly, you
are measuring memory bandwidth and the hard drive light will never go
on. :-) The main reason why it was interesting was that it and tbench
was used to model a really bad industry benchmark, netbench, which at
one point a number of years ago I/T managers used to decide which CIFS
server they would buy[1]. So it was useful for Samba developers who were
trying to do competitive benchmkars, but it's not a very accurate
benchmark for measuring real-life file system workloads.
[1] http://samba.org/ftp/tridge/dbench/README
> On the plus side, test setup context is provided in the "env"
> directory, which is rare enough to be commendable.
Absolutely. :-)
Another good example of well done file system benchmarks can be found
at http://btrfs.boxacle.net; it's done by someone who does performance
benchmarks for a living. Note that JFS and XFS come off much better
on a number of the tests --- and that there is a *large* number amount
of variation when you look at different simulated workloads and with a
varying number of threads writing to the file system at the same time.
Regards,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Jfs-discussion] benchmark results
> I've had the chance to use a testsystem here and couldn't
> resist
Unfortunately there seems to be an overproduction of rather
meaningless file system "benchmarks"...
> running a few benchmark programs on them: bonnie++, tiobench,
> dbench and a few generic ones (cp/rm/tar/etc...) on ext{234},
> btrfs, jfs, ufs, xfs, zfs. All with standard mkfs/mount options
> and +noatime for all of them.
> Here are the results, no graphs - sorry: [ ... ]
After having a glance, I suspect that your tests could be
enormously improved, and doing so would reduce the pointlessness of
the results.
A couple of hints:
* In the "generic" test the 'tar' test bandwidth is exactly the
same ("276.68 MB/s") for nearly all filesystems.
* There are read transfer rates higher than the one reported by
'hdparm' which is "66.23 MB/sec" (comically enough *all* the
read transfer rates your "benchmarks" report are higher).
BTW the use of Bonnie++ is also usually a symptom of a poor
misunderstanding of file system benchmarking.
On the plus side, test setup context is provided in the "env"
directory, which is rare enough to be commendable.
> Short summary, AFAICT:
> - btrfs, ext4 are the overall winners
> - xfs to, but creating/deleting many files was *very* slow
Maybe, and these conclusions are sort of plausible (but I prefer
JFS and XFS for different reasons); however they are not supported
by your results as they seem to me to lack much meaning, as what is
being measured is far from clear, and in particular it does not
seem to be the file system performance, or anyhow an aspect of
filesystem performance that might relate to common usage.
I think that it is rather better to run a few simple operations
(like the "generic" test) properly (unlike the "generic" test), to
give a feel for how well implemented are the basic operations of
the file system design.
Profiling a file system performance with a meaningful full scale
benchmark is a rather difficult task requiring great intellectual
fortitude and lots of time.
> - if you need only fast but no cool features or
> journaling, ext2 is still a good choice :)
That is however a generally valid conclusion, but with a very,
very important qualification: for freshly loaded filesystems.
Also with several other important qualifications, but "freshly
loaded" is a pet peeve of mine :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
