Re: Benchmarking btrfs on HW Raid ... BAD

2009-09-30 Thread Ric Wheeler

On 09/28/2009 05:39 AM, Tobias Oetiker wrote:

Hi Daniel,

Today Daniel J Blueman wrote:

   

On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimerfwei...@bfk.de  wrote:
 

* Tobias Oetiker:

   

Running this on a single disk, I get the quite acceptable results.
When running on-top of a Areca HW Raid6 (lvm partitioned)
then both read and write performance go down by at least 2
magnitudes.
 

Does the HW RAID use write caching (preferably battery-backed)?
   

I believe Areca controllers have an option for writeback or
writethrough caching, so it's worth checking this and that you're
running the current firmware, in case of errata. Ironically, disabling
writeback will give the OS tighter control of request latency, but
throughput may drop a lot. I still can't help thinking that this is
down to the behaviour of the controller, due to the 1-disk case
working well.
 

it certainly is down to  a behaviour of the controller, or the
results would be the same as with a single sata disk :-) It would
be interesting to see what results others get on HW Raid
Controllers ...

   

One way would be to configure the array as 6 or 7 devices, and allow
BTRFS/DM to mange the array, then see if performance under write load
is better, and with or without writeback caching...
 

I can imagine that this would help, but since btrfs aims to be
multipurpose, this does not realy help all that much since this
fundamentally alters the 'conditions' at the moment the RAID
contains different filesystem and is partitioned using lvm ...

cheers
tobi

the results for ext3 fs look like this ...

   


I would be more suspicious of the barrier/flushes being issued. If your 
write cache is non-volatile, we really do not want to send them down to 
this type of device. Flushing this type of cache could certainly be 
very, very expensive and slow.


Try mount -o nobarrier and see if your performance (write cache still 
enabled on the controller) is back to expected levels,


Ric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Benchmarking btrfs on HW Raid ... BAD

2009-09-30 Thread Tobias Oetiker
Hi Ric,

Today Ric Wheeler wrote:

 I would be more suspicious of the barrier/flushes being issued. If your write
 cache is non-volatile, we really do not want to send them down to this type of
 device. Flushing this type of cache could certainly be very, very expensive
 and slow.

 Try mount -o nobarrier and see if your performance (write cache still
 enabled on the controller) is back to expected levels,

wow, indeed ...

without special mount options I get the following from my RAID6 with non 
volatile cache:
##

1 readers (30s)
--
A read dircnt  78845min0.001 msmax   29.713 msmean
0.027 msstdev   0.421
B lstat file  cnt  73600min0.006 msmax   21.639 msmean
0.038 msstdev   0.273
C open file   cnt  57862min0.013 msmax0.100 msmean
0.017 msstdev   0.003
D rd 1st byte cnt  57861min0.014 msmax   70.214 msmean
0.209 msstdev   0.919
E read rate 185.464 MB/s (data)  63.842 MB/s (readdir + open + 1st byte + 
data)

3 readers (30s)
--
A read dircnt  41222min0.001 msmax  169.195 msmean
0.056 msstdev   1.113
B lstat file  cnt  38447min0.006 msmax   79.977 msmean
0.064 msstdev   0.746
C open file   cnt  30122min0.013 msmax0.042 msmean
0.018 msstdev   0.003
D rd 1st byte cnt  30122min0.014 msmax  597.264 msmean
0.535 msstdev   6.646
E read rate 124.144 MB/s (data)  31.197 MB/s (readdir + open + 1st byte + 
data)

3 readers, 3 writers (30s)
--
F write open  cnt107min0.063 msmax   70.593 msmean
0.760 msstdev   6.784
G wr 1st byte cnt107min0.006 msmax0.014 msmean
0.007 msstdev   0.002
H write close cnt107min0.017 msmax 1784.192 msmean   
20.830 msstdev 176.474
I mkdir   cnt  9min0.049 msmax9.184 msmean
1.079 msstdev   2.865
J write rate  0.200 MB/s (data)   0.199 MB/s (open + 1st byte + data)

A read dircnt   1215min0.001 msmax 2661.328 msmean
4.008 msstdev  81.513
B lstat file  cnt   1144min0.007 msmax  377.476 msmean
1.827 msstdev  18.844
C open file   cnt928min0.014 msmax1.596 msmean
0.021 msstdev   0.056
D rd 1st byte cnt928min0.015 msmax 1936.262 msmean   
25.187 msstdev 123.755
E read rate   9.199 MB/s (data)   0.792 MB/s (readdir + open + 1st byte + 
data)


mounting with -o nobarrier I get ...
##

1 readers (30s)
--
A read dircnt  78876min0.001 msmax   19.803 msmean
0.013 msstdev   0.228
B lstat file  cnt  73624min0.006 msmax   18.032 msmean
0.034 msstdev   0.210
C open file   cnt  57868min0.014 msmax0.041 msmean
0.017 msstdev   0.003
D rd 1st byte cnt  57869min0.019 msmax  417.725 msmean
0.225 msstdev   2.459
E read rate 177.779 MB/s (data)  63.375 MB/s (readdir + open + 1st byte + 
data)

3 readers (30s)
--
A read dircnt  38209min0.001 msmax   26.745 msmean
0.025 msstdev   0.472
B lstat file  cnt  35624min0.006 msmax   26.019 msmean
0.048 msstdev   0.410
C open file   cnt  27874min0.014 msmax1.257 msmean
0.017 msstdev   0.008
D rd 1st byte cnt  27874min0.020 msmax 3197.520 msmean
0.626 msstdev  20.279
E read rate  98.242 MB/s (data)  27.763 MB/s (readdir + open + 1st byte + 
data)

3 readers, 3 writers (30s)
--
F write open  cnt   5957min0.061 msmax  591.787 msmean
0.457 msstdev   9.956
G wr 1st byte cnt   5956min0.006 msmax0.136 msmean
0.007 msstdev   0.002
H write close cnt   5957min0.017 msmax 1340.145 msmean
0.818 msstdev  22.442
I mkdir   cnt574min0.034 msmax   11.094 msmean
0.083 msstdev   0.543
J write rate  9.766 MB/s (data)   8.705 MB/s (open + 1st byte + data)

A read dircnt  15183min0.001 msmax  439.260 msmean
0.130 msstdev   4.150
B lstat file  cnt  14199min0.006 msmax  200.212 msmean
0.152 msstdev   3.420
C open file   cnt  

[PATCH 2/2] btrfs: remove duplicates of filemap_ helpers

2009-09-30 Thread Christoph Hellwig
Use filemap_fdatawrite_range and filemap_fdatawait_range instead of
local copies of the functions.  For filemap_fdatawait_range that
also means replacing the awkward old wait_on_page_writeback_range
calling convention with the regular filemap byte offsets.

Signed-off-by: Christoph Hellwig h...@lst.de

Index: linux-2.6/fs/btrfs/disk-io.c
===
--- linux-2.6.orig/fs/btrfs/disk-io.c   2009-09-30 13:55:25.396005824 -0300
+++ linux-2.6/fs/btrfs/disk-io.c2009-09-30 13:57:49.917005980 -0300
@@ -822,16 +822,14 @@ struct extent_buffer *btrfs_find_create_
 
 int btrfs_write_tree_block(struct extent_buffer *buf)
 {
-   return btrfs_fdatawrite_range(buf-first_page-mapping, buf-start,
- buf-start + buf-len - 1, WB_SYNC_ALL);
+   return filemap_fdatawrite_range(buf-first_page-mapping, buf-start,
+   buf-start + buf-len - 1);
 }
 
 int btrfs_wait_tree_block_writeback(struct extent_buffer *buf)
 {
-   return btrfs_wait_on_page_writeback_range(buf-first_page-mapping,
- buf-start  PAGE_CACHE_SHIFT,
- (buf-start + buf-len - 1) 
-  PAGE_CACHE_SHIFT);
+   return filemap_fdatawait_range(buf-first_page-mapping,
+  buf-start, buf-start + buf-len - 1);
 }
 
 struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr,
Index: linux-2.6/fs/btrfs/ordered-data.c
===
--- linux-2.6.orig/fs/btrfs/ordered-data.c  2009-09-30 13:44:52.424274060 
-0300
+++ linux-2.6/fs/btrfs/ordered-data.c   2009-09-30 13:56:56.751254722 -0300
@@ -458,7 +458,7 @@ void btrfs_start_ordered_extent(struct i
 * start IO on any dirty ones so the wait doesn't stall waiting
 * for pdflush to find them
 */
-   btrfs_fdatawrite_range(inode-i_mapping, start, end, WB_SYNC_ALL);
+   filemap_fdatawrite_range(inode-i_mapping, start, end);
if (wait) {
wait_event(entry-wait, test_bit(BTRFS_ORDERED_COMPLETE,
 entry-flags));
@@ -488,17 +488,15 @@ again:
/* start IO across the range first to instantiate any delalloc
 * extents
 */
-   btrfs_fdatawrite_range(inode-i_mapping, start, orig_end, WB_SYNC_ALL);
+   filemap_fdatawrite_range(inode-i_mapping, start, orig_end);
 
/* The compression code will leave pages locked but return from
 * writepage without setting the page writeback.  Starting again
 * with WB_SYNC_ALL will end up waiting for the IO to actually start.
 */
-   btrfs_fdatawrite_range(inode-i_mapping, start, orig_end, WB_SYNC_ALL);
+   filemap_fdatawrite_range(inode-i_mapping, start, orig_end);
 
-   btrfs_wait_on_page_writeback_range(inode-i_mapping,
-  start  PAGE_CACHE_SHIFT,
-  orig_end  PAGE_CACHE_SHIFT);
+   filemap_fdatawait_range(inode-i_mapping, start, orig_end);
 
end = orig_end;
found = 0;
@@ -716,89 +714,6 @@ out:
 }
 
 
-/**
- * taken from mm/filemap.c because it isn't exported
- *
- * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range
- * @mapping:   address space structure to write
- * @start: offset in bytes where the range starts
- * @end:   offset in bytes where the range ends (inclusive)
- * @sync_mode: enable synchronous operation
- *
- * Start writeback against all of a mapping's dirty pages that lie
- * within the byte offsets start, end inclusive.
- *
- * If sync_mode is WB_SYNC_ALL then this is a data integrity operation, as
- * opposed to a regular memory cleansing writeback.  The difference between
- * these two operations is that if a dirty page/buffer is encountered, it must
- * be waited upon, and not just skipped over.
- */
-int btrfs_fdatawrite_range(struct address_space *mapping, loff_t start,
-  loff_t end, int sync_mode)
-{
-   struct writeback_control wbc = {
-   .sync_mode = sync_mode,
-   .nr_to_write = mapping-nrpages * 2,
-   .range_start = start,
-   .range_end = end,
-   };
-   return btrfs_writepages(mapping, wbc);
-}
-
-/**
- * taken from mm/filemap.c because it isn't exported
- *
- * wait_on_page_writeback_range - wait for writeback to complete
- * @mapping:   target address_space
- * @start: beginning page index
- * @end:   ending page index
- *
- * Wait for writeback to complete against pages indexed by start-end
- * inclusive
- */
-int btrfs_wait_on_page_writeback_range(struct address_space *mapping,
-  pgoff_t start, pgoff_t end)
-{
-   struct pagevec pvec;
-   int nr_pages;
-   int ret = 0;
-   pgoff_t