Re: How git affects kernel.org performance

2007-01-10 Thread Fengguang Wu
On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote:
 Hi.
 
 On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote:
  On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:
  
  
   On Tue, 9 Jan 2007, Fengguang Wu wrote:

 The fastest and probably most important thing to add is some readahead
 smarts to directories --- both to the htree and non-htree cases.  If
   
Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)
  
   Well, I'd probably have done it differently, but more important is whether
   this actually makes a difference performance-wise. Have you benchmarked it
   at all?
  
  Yes, a trivial test shows a marginal improvement, on a minimal debian 
  system:
  
  # find / | wc -l
  13641
  
  # time find /  /dev/null
  
  real0m10.000s
  user0m0.210s
  sys 0m4.370s
  
  # time find /  /dev/null
  
  real0m9.890s
  user0m0.160s
  sys 0m3.270s
  
   Doing an
  
 echo 3  /proc/sys/vm/drop_caches
  
   is your friend for testing things like this, to force cold-cache
   behaviour..
  
  Thanks, I'll work out numbers on large/concurrent dir accesses soon.
 
 I gave it a try, and I'm afraid the results weren't pretty.
 
 I did:
 
 time find /usr/src | wc -l
 
 on current git with (3 times) and without (5 times) the patch, and got
 
 with:
 real   54.306, 54.327, 53.742s
 usr0.324, 0.284, 0.234s
 sys2.432, 2.484, 2.592s
 
 without:
 real   24.413, 24.616, 24.080s
 usr0.208, 0.316, 0.312s
 sys:   2.496, 2.440, 2.540s
 
 Subsequent runs without dropping caches did give a significant
 improvement in both cases (1.821/.188/1.632 is one result I wrote with
 the patch applied).

Thanks, Nigel.
But I'm very sorry that the calculation in the patch was wrong.

Would you give this new patch a run?

It produced pretty numbers here:

#!/bin/zsh

ROOT=/mnt/mnt
TIMEFMT=%E clock  %S kernel  %U user  %w+%c cs  %J

echo 3  /proc/sys/vm/drop_caches

# 49: enable dir readahead
# 50: disable
echo ${1:-50}  /proc/sys/vm/readahead_ratio

# time find $ROOT/a  /dev/null

time find /etch  /dev/null

# time find $ROOT/a  /dev/null
# time grep -r asdf $ROOT/b  /dev/null
# time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null

exit 0

# collected results on a SATA disk:
# ./test-parallel-dir-reada.sh 49
4.18s clock  0.08s kernel  0.04s user  418+0 cs  find $ROOT/a  /dev/null
4.09s clock  0.10s kernel  0.02s user  410+1 cs  find $ROOT/a  /dev/null

# ./test-parallel-dir-reada.sh 50
12.18s clock  0.15s kernel  0.07s user  1520+4 cs  find $ROOT/a  /dev/null
11.99s clock  0.13s kernel  0.04s user  1558+6 cs  find $ROOT/a  /dev/null


# ./test-parallel-dir-reada.sh 49
4.01s clock  0.06s kernel  0.01s user  1567+2 cs  find /etch  /dev/null
4.08s clock  0.07s kernel  0.00s user  1568+0 cs  find /etch  /dev/null

# ./test-parallel-dir-reada.sh 50
4.10s clock  0.09s kernel  0.01s user  1578+1 cs  find /etch  /dev/null
4.19s clock  0.08s kernel  0.03s user  1578+0 cs  find /etch  /dev/null


# ./test-parallel-dir-reada.sh 49
7.73s clock  0.11s kernel  0.06s user  438+2 cs  find $ROOT/a  /dev/null
18.92s clock  0.43s kernel  0.02s user  1246+13 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
32.91s clock  4.20s kernel  1.55s user  103564+51 cs  grep -r asdf $ROOT/b  
/dev/null

8.47s clock  0.10s kernel  0.02s user  442+4 cs  find $ROOT/a  /dev/null
19.24s clock  0.53s kernel  0.03s user  1250+23 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
29.93s clock  4.18s kernel  1.61s user  100425+47 cs  grep -r asdf $ROOT/b  
/dev/null

# ./test-parallel-dir-reada.sh 50
17.87s clock  0.57s kernel  0.02s user  1244+21 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
21.30s clock  0.08s kernel  0.05s user  1517+5 cs  find $ROOT/a  /dev/null
49.68s clock  3.94s kernel  1.67s user  101520+57 cs  grep -r asdf $ROOT/b  
/dev/null

15.66s clock  0.51s kernel  0.00s user  1248+25 cs  cp 
/etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null
22.15s clock  0.15s kernel  0.04s user  1520+5 cs  find $ROOT/a  /dev/null
46.14s clock  4.08s kernel  1.68s user  101517+63 cs  grep -r asdf $ROOT/b  
/dev/null

Thanks,
Wu
---

Subject: ext3 readdir readahead

Do readahead for ext3_readdir().

Reasons to be aggressive:
- readdir() users are likely to traverse the whole directory,
  so readahead miss is not a concern.
- most dirs are small, so slow start is not good
- the htree indexing introduces some randomness,
  which can be helped by the aggressiveness.

So we do 128K sized readaheads, at twice the speed of reads.

The following actual readahead pages are collected for a dir with
11 entries:
32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19
That means a readahead hit ratio of
454/541 = 84%

The performance is marginally better for a minimal debian system:
command:find /
baseline:   4.10s   4.19s
patched:4.01s   4.08s

And 

Re: How git affects kernel.org performance

2007-01-09 Thread Fengguang Wu
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
 On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote:
   Yeah, slowly-growing directories will get splattered all over the disk.
   
   Possible short-term fixes would be to just allocate up to (say) eight
   blocks when we grow a directory by one block.  Or teach the
   directory-growth code to use ext3 reservations.
   
   Longer-term people are talking about things like on-disk rerservations.
   But I expect directories are being forgotten about in all of that.
  
  By on-disk reservations, do you mean persistent file preallocation ? (that
  is explicit preallocation of blocks to a given file) If so, you are
  right, we haven't really given any thought to the possibility of directories
  needing that feature.
 
 The fastest and probably most important thing to add is some readahead
 smarts to directories --- both to the htree and non-htree cases.  If

Here's is a quick hack to practice the directory readahead idea.
Comments are welcome, it's a freshman's work :)

Regards,
Wu
---
 fs/ext3/dir.c   |   22 ++
 fs/ext3/inode.c |2 +-
 2 files changed, 23 insertions(+), 1 deletion(-)

--- linux.orig/fs/ext3/dir.c
+++ linux/fs/ext3/dir.c
@@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f
return error_msg == NULL ? 1 : 0;
 }
 
+int ext3_get_block(struct inode *inode, sector_t iblock,
+   struct buffer_head *bh_result, int create);
+
+static void ext3_dir_readahead(struct file * filp)
+{
+   struct inode *inode = filp-f_path.dentry-d_inode;
+   struct address_space *mapping = 
inode-i_sb-s_bdev-bd_inode-i_mapping;
+   unsigned long sector;
+   unsigned long blk;
+   pgoff_t offset;
+
+   for (blk = 0; blk  inode-i_blocks; blk++) {
+   sector = blk  (inode-i_blkbits - 9);
+   sector = generic_block_bmap(inode-i_mapping, sector, 
ext3_get_block);
+   offset = sector  (PAGE_CACHE_SHIFT - 9);
+   do_page_cache_readahead(mapping, filp, offset, 1);
+   }
+}
+
 static int ext3_readdir(struct file * filp,
 void * dirent, filldir_t filldir)
 {
@@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi
 
sb = inode-i_sb;
 
+   if (!filp-f_pos)
+   ext3_dir_readahead(filp);
+
 #ifdef CONFIG_EXT3_INDEX
if (EXT3_HAS_COMPAT_FEATURE(inode-i_sb,
EXT3_FEATURE_COMPAT_DIR_INDEX) 
--- linux.orig/fs/ext3/inode.c
+++ linux/fs/ext3/inode.c
@@ -945,7 +945,7 @@ out:
 
 #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32)
 
-static int ext3_get_block(struct inode *inode, sector_t iblock,
+int ext3_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
 {
handle_t *handle = journal_current_handle();
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-09 Thread Fengguang Wu
On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote:


 On Tue, 9 Jan 2007, Fengguang Wu wrote:
  
   The fastest and probably most important thing to add is some readahead
   smarts to directories --- both to the htree and non-htree cases.  If
 
  Here's is a quick hack to practice the directory readahead idea.
  Comments are welcome, it's a freshman's work :)

 Well, I'd probably have done it differently, but more important is whether
 this actually makes a difference performance-wise. Have you benchmarked it
 at all?

Yes, a trivial test shows a marginal improvement, on a minimal debian system:

# find / | wc -l
13641

# time find /  /dev/null

real0m10.000s
user0m0.210s
sys 0m4.370s

# time find /  /dev/null

real0m9.890s
user0m0.160s
sys 0m3.270s

 Doing an

   echo 3  /proc/sys/vm/drop_caches

 is your friend for testing things like this, to force cold-cache
 behaviour..

Thanks, I'll work out numbers on large/concurrent dir accesses soon.

Regards,
Wu
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-08 Thread Jeff Garzik

Theodore Tso wrote:

The fastest and probably most important thing to add is some readahead
smarts to directories --- both to the htree and non-htree cases.  If
you're using some kind of b-tree structure, such as XFS does for
directories, preallocation doesn't help you much.  Delayed allocation
can save you if your delayed allocator knows how to structure disk
blocks so that a btree-traversal is efficient, but I'm guessing the
biggest reason why we are losing is because we don't have sufficient
readahead.  This also has the advantage that it will help without
needing to doing a backup/restore to improve layout.



Something I just thought of:  ATA and SCSI hard disks do their own 
read-ahead.  Seeking all over the place to pick up bits of directory 
will hurt even more with the disk reading and throwing away data (albeit 
in its internal elevator and cache).


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 02:41:47PM +0100, Johannes Stezenbach wrote:
 
 Would e2fsck -D help? What kind of optimization
 does it perform?

It will help a little; e2fsck -D compresses the logical view of the
directory, but it doesn't optimize the physical layout on disk at all,
and of course, it won't help with the lack of readahead logic.  It's
possible to improve how e2fsck -D works, at the moment, it's not
trying to make the directory be contiguous on disk.  What it should
probably do is to pull a list of all of the blocks used by the
directory, sort them, and then try to see if it can improve on the
list by allocating some new blocks that would make the directory more
contiguous on disk.  I suspect any improvements that would be seen by
doing this would be second order effects at most, though.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-08 Thread Pavel Machek
Hi!

  Would e2fsck -D help? What kind of optimization
  does it perform?
 
 It will help a little; e2fsck -D compresses the logical view of the
 directory, but it doesn't optimize the physical layout on disk at all,
 and of course, it won't help with the lack of readahead logic.  It's
 possible to improve how e2fsck -D works, at the moment, it's not
 trying to make the directory be contiguous on disk.  What it should
 probably do is to pull a list of all of the blocks used by the
 directory, sort them, and then try to see if it can improve on the
 list by allocating some new blocks that would make the directory more
 contiguous on disk.  I suspect any improvements that would be seen by
 doing this would be second order effects at most, though.

...sounds like a job for e2defrag, not e2fsck...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-08 Thread Johannes Stezenbach
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote:
 
 The fastest and probably most important thing to add is some readahead
 smarts to directories --- both to the htree and non-htree cases.  If
 you're using some kind of b-tree structure, such as XFS does for
 directories, preallocation doesn't help you much.  Delayed allocation
 can save you if your delayed allocator knows how to structure disk
 blocks so that a btree-traversal is efficient, but I'm guessing the
 biggest reason why we are losing is because we don't have sufficient
 readahead.  This also has the advantage that it will help without
 needing to doing a backup/restore to improve layout.

Would e2fsck -D help? What kind of optimization
does it perform?


Thanks,
Johannes
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-08 Thread Theodore Tso
On Mon, Jan 08, 2007 at 02:59:52PM +0100, Pavel Machek wrote:
 Hi!
 
   Would e2fsck -D help? What kind of optimization
   does it perform?
  
  It will help a little; e2fsck -D compresses the logical view of the
  directory, but it doesn't optimize the physical layout on disk at all,
  and of course, it won't help with the lack of readahead logic.  It's
  possible to improve how e2fsck -D works, at the moment, it's not
  trying to make the directory be contiguous on disk.  What it should
  probably do is to pull a list of all of the blocks used by the
  directory, sort them, and then try to see if it can improve on the
  list by allocating some new blocks that would make the directory more
  contiguous on disk.  I suspect any improvements that would be seen by
  doing this would be second order effects at most, though.
 
 ...sounds like a job for e2defrag, not e2fsck...

I wasn't proposing to move other data blocks around in order make the
directory be contiguous, but just a quick and dirty try to make
things better.  But yes, in order to really fix layout issues you
would have to do a full defrag, and it's probably more important that
we try to fix things so that defragmentation runs aren't necessary in
the first place

- Ted

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-08 Thread Jeremy Higdon
On Mon, Jan 08, 2007 at 05:09:34PM -0800, Paul Jackson wrote:
 Jeff wrote:
  Something I just thought of:  ATA and SCSI hard disks do their own
  read-ahead.
 
 Probably this is wishful thinking on my part, but I would have hoped
 that most of the read-ahead they did was for stuff that happened to be
 on the cylinder they were reading anyway.  So long as their read-ahead
 doesn't cause much extra or delayed disk head motion, what does it
 matter?


And they usually won't readahead if there is another command to
process, though they can be set up to read unrequested data in
spite of outstanding commands.

When they are reading ahead, they'll only fetch LBAs beyond the last
request until a buffer fills or the readahead gets interrupted.

jeremy
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-07 Thread Andrew Morton
On Sun, 7 Jan 2007 09:55:26 +0100
Willy Tarreau [EMAIL PROTECTED] wrote:

 On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
  
  
  On Sat, 6 Jan 2007, H. Peter Anvin wrote:
   
   During extremely high load, it appears that what slows kernel.org down 
   more
   than anything else is the time that each individual getdents() call takes.
   When I've looked this I've observed times from 200 ms to almost 2 seconds!
   Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly
   packed tree, you can do the math yourself.
  
  getdents() is totally serialized by the inode semaphore. It's one of the 
  most expensive system calls in Linux, partly because of that, and partly 
  because it has to call all the way down into the filesystem in a way that 
  almost no other common system call has to (99% of all filesystem calls can 
  be handled basically at the VFS layer with generic caches - but not 
  getdents()).
  
  So if there are concurrent readdirs on the same directory, they get 
  serialized. If there is any file creation/deletion activity in the 
  directory, it serializes getdents(). 
  
  To make matters worse, I don't think it has any read-ahead at all when you 
  use hashed directory entries. So if you have cold-cache case, you'll read 
  every single block totally individually, and serialized. One block at a 
  time (I think the non-hashed case is likely also suspect, but that's a 
  separate issue)
  
  In other words, I'm not at all surprised it hits on filldir time. 
  Especially on ext3.
 
 At work, we had the same problem on a file server with ext3. We use rsync
 to make backups to a local IDE disk, and we noticed that getdents() took
 about the same time as Peter reports (0.2 to 2 seconds), especially in
 maildir directories. We tried many things to fix it with no result,
 including enabling dirindexes. Finally, we made a full backup, and switched
 over to XFS and the problem totally disappeared. So it seems that the
 filesystem matters a lot here when there are lots of entries in a
 directory, and that ext3 is not suitable for usages with thousands
 of entries in directories with millions of files on disk. I'm not
 certain it would be that easy to try other filesystems on kernel.org
 though :-/
 

Yeah, slowly-growing directories will get splattered all over the disk.

Possible short-term fixes would be to just allocate up to (say) eight
blocks when we grow a directory by one block.  Or teach the
directory-growth code to use ext3 reservations.

Longer-term people are talking about things like on-disk rerservations. 
But I expect directories are being forgotten about in all of that.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How git affects kernel.org performance

2007-01-07 Thread Suparna Bhattacharya
On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote:
 On Sun, 7 Jan 2007 09:55:26 +0100
 Willy Tarreau [EMAIL PROTECTED] wrote:
 
  On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote:
  
  
   On Sat, 6 Jan 2007, H. Peter Anvin wrote:
   
During extremely high load, it appears that what slows kernel.org down 
more
than anything else is the time that each individual getdents() call 
takes.
When I've looked this I've observed times from 200 ms to almost 2 
seconds!
Since an unpacked *OR* unpruned git tree adds 256 directories to a 
cleanly
packed tree, you can do the math yourself.
  
   getdents() is totally serialized by the inode semaphore. It's one of the
   most expensive system calls in Linux, partly because of that, and partly
   because it has to call all the way down into the filesystem in a way that
   almost no other common system call has to (99% of all filesystem calls can
   be handled basically at the VFS layer with generic caches - but not
   getdents()).
  
   So if there are concurrent readdirs on the same directory, they get
   serialized. If there is any file creation/deletion activity in the
   directory, it serializes getdents().
  
   To make matters worse, I don't think it has any read-ahead at all when you
   use hashed directory entries. So if you have cold-cache case, you'll read
   every single block totally individually, and serialized. One block at a
   time (I think the non-hashed case is likely also suspect, but that's a
   separate issue)
  
   In other words, I'm not at all surprised it hits on filldir time.
   Especially on ext3.
 
  At work, we had the same problem on a file server with ext3. We use rsync
  to make backups to a local IDE disk, and we noticed that getdents() took
  about the same time as Peter reports (0.2 to 2 seconds), especially in
  maildir directories. We tried many things to fix it with no result,
  including enabling dirindexes. Finally, we made a full backup, and switched
  over to XFS and the problem totally disappeared. So it seems that the
  filesystem matters a lot here when there are lots of entries in a
  directory, and that ext3 is not suitable for usages with thousands
  of entries in directories with millions of files on disk. I'm not
  certain it would be that easy to try other filesystems on kernel.org
  though :-/
 
 
 Yeah, slowly-growing directories will get splattered all over the disk.
 
 Possible short-term fixes would be to just allocate up to (say) eight
 blocks when we grow a directory by one block.  Or teach the
 directory-growth code to use ext3 reservations.
 
 Longer-term people are talking about things like on-disk rerservations.
 But I expect directories are being forgotten about in all of that.

By on-disk reservations, do you mean persistent file preallocation ? (that
is explicit preallocation of blocks to a given file) If so, you are
right, we haven't really given any thought to the possibility of directories
needing that feature.

Regards
Suparna

 
 -
 To unsubscribe from this list: send the line unsubscribe linux-ext4 in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Suparna Bhattacharya ([EMAIL PROTECTED])
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html