Re: How git affects kernel.org performance
On Wed, Jan 10, 2007 at 02:20:49PM +1100, Nigel Cunningham wrote: Hi. On Wed, 2007-01-10 at 09:57 +0800, Fengguang Wu wrote: On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: On Tue, 9 Jan 2007, Fengguang Wu wrote: The fastest and probably most important thing to add is some readahead smarts to directories --- both to the htree and non-htree cases. If Here's is a quick hack to practice the directory readahead idea. Comments are welcome, it's a freshman's work :) Well, I'd probably have done it differently, but more important is whether this actually makes a difference performance-wise. Have you benchmarked it at all? Yes, a trivial test shows a marginal improvement, on a minimal debian system: # find / | wc -l 13641 # time find / /dev/null real0m10.000s user0m0.210s sys 0m4.370s # time find / /dev/null real0m9.890s user0m0.160s sys 0m3.270s Doing an echo 3 /proc/sys/vm/drop_caches is your friend for testing things like this, to force cold-cache behaviour.. Thanks, I'll work out numbers on large/concurrent dir accesses soon. I gave it a try, and I'm afraid the results weren't pretty. I did: time find /usr/src | wc -l on current git with (3 times) and without (5 times) the patch, and got with: real 54.306, 54.327, 53.742s usr0.324, 0.284, 0.234s sys2.432, 2.484, 2.592s without: real 24.413, 24.616, 24.080s usr0.208, 0.316, 0.312s sys: 2.496, 2.440, 2.540s Subsequent runs without dropping caches did give a significant improvement in both cases (1.821/.188/1.632 is one result I wrote with the patch applied). Thanks, Nigel. But I'm very sorry that the calculation in the patch was wrong. Would you give this new patch a run? It produced pretty numbers here: #!/bin/zsh ROOT=/mnt/mnt TIMEFMT=%E clock %S kernel %U user %w+%c cs %J echo 3 /proc/sys/vm/drop_caches # 49: enable dir readahead # 50: disable echo ${1:-50} /proc/sys/vm/readahead_ratio # time find $ROOT/a /dev/null time find /etch /dev/null # time find $ROOT/a /dev/null # time grep -r asdf $ROOT/b /dev/null # time cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null exit 0 # collected results on a SATA disk: # ./test-parallel-dir-reada.sh 49 4.18s clock 0.08s kernel 0.04s user 418+0 cs find $ROOT/a /dev/null 4.09s clock 0.10s kernel 0.02s user 410+1 cs find $ROOT/a /dev/null # ./test-parallel-dir-reada.sh 50 12.18s clock 0.15s kernel 0.07s user 1520+4 cs find $ROOT/a /dev/null 11.99s clock 0.13s kernel 0.04s user 1558+6 cs find $ROOT/a /dev/null # ./test-parallel-dir-reada.sh 49 4.01s clock 0.06s kernel 0.01s user 1567+2 cs find /etch /dev/null 4.08s clock 0.07s kernel 0.00s user 1568+0 cs find /etch /dev/null # ./test-parallel-dir-reada.sh 50 4.10s clock 0.09s kernel 0.01s user 1578+1 cs find /etch /dev/null 4.19s clock 0.08s kernel 0.03s user 1578+0 cs find /etch /dev/null # ./test-parallel-dir-reada.sh 49 7.73s clock 0.11s kernel 0.06s user 438+2 cs find $ROOT/a /dev/null 18.92s clock 0.43s kernel 0.02s user 1246+13 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 32.91s clock 4.20s kernel 1.55s user 103564+51 cs grep -r asdf $ROOT/b /dev/null 8.47s clock 0.10s kernel 0.02s user 442+4 cs find $ROOT/a /dev/null 19.24s clock 0.53s kernel 0.03s user 1250+23 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 29.93s clock 4.18s kernel 1.61s user 100425+47 cs grep -r asdf $ROOT/b /dev/null # ./test-parallel-dir-reada.sh 50 17.87s clock 0.57s kernel 0.02s user 1244+21 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 21.30s clock 0.08s kernel 0.05s user 1517+5 cs find $ROOT/a /dev/null 49.68s clock 3.94s kernel 1.67s user 101520+57 cs grep -r asdf $ROOT/b /dev/null 15.66s clock 0.51s kernel 0.00s user 1248+25 cs cp /etch/KNOPPIX_V5.0.1CD-2006-06-01-EN.iso /dev/null 22.15s clock 0.15s kernel 0.04s user 1520+5 cs find $ROOT/a /dev/null 46.14s clock 4.08s kernel 1.68s user 101517+63 cs grep -r asdf $ROOT/b /dev/null Thanks, Wu --- Subject: ext3 readdir readahead Do readahead for ext3_readdir(). Reasons to be aggressive: - readdir() users are likely to traverse the whole directory, so readahead miss is not a concern. - most dirs are small, so slow start is not good - the htree indexing introduces some randomness, which can be helped by the aggressiveness. So we do 128K sized readaheads, at twice the speed of reads. The following actual readahead pages are collected for a dir with 11 entries: 32 31 30 31 28 29 29 28 27 25 29 22 25 30 24 15 19 That means a readahead hit ratio of 454/541 = 84% The performance is marginally better for a minimal debian system: command:find / baseline: 4.10s 4.19s patched:4.01s 4.08s And
Re: How git affects kernel.org performance
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote: On Mon, Jan 08, 2007 at 08:35:55AM +0530, Suparna Bhattacharya wrote: Yeah, slowly-growing directories will get splattered all over the disk. Possible short-term fixes would be to just allocate up to (say) eight blocks when we grow a directory by one block. Or teach the directory-growth code to use ext3 reservations. Longer-term people are talking about things like on-disk rerservations. But I expect directories are being forgotten about in all of that. By on-disk reservations, do you mean persistent file preallocation ? (that is explicit preallocation of blocks to a given file) If so, you are right, we haven't really given any thought to the possibility of directories needing that feature. The fastest and probably most important thing to add is some readahead smarts to directories --- both to the htree and non-htree cases. If Here's is a quick hack to practice the directory readahead idea. Comments are welcome, it's a freshman's work :) Regards, Wu --- fs/ext3/dir.c | 22 ++ fs/ext3/inode.c |2 +- 2 files changed, 23 insertions(+), 1 deletion(-) --- linux.orig/fs/ext3/dir.c +++ linux/fs/ext3/dir.c @@ -94,6 +94,25 @@ int ext3_check_dir_entry (const char * f return error_msg == NULL ? 1 : 0; } +int ext3_get_block(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create); + +static void ext3_dir_readahead(struct file * filp) +{ + struct inode *inode = filp-f_path.dentry-d_inode; + struct address_space *mapping = inode-i_sb-s_bdev-bd_inode-i_mapping; + unsigned long sector; + unsigned long blk; + pgoff_t offset; + + for (blk = 0; blk inode-i_blocks; blk++) { + sector = blk (inode-i_blkbits - 9); + sector = generic_block_bmap(inode-i_mapping, sector, ext3_get_block); + offset = sector (PAGE_CACHE_SHIFT - 9); + do_page_cache_readahead(mapping, filp, offset, 1); + } +} + static int ext3_readdir(struct file * filp, void * dirent, filldir_t filldir) { @@ -108,6 +127,9 @@ static int ext3_readdir(struct file * fi sb = inode-i_sb; + if (!filp-f_pos) + ext3_dir_readahead(filp); + #ifdef CONFIG_EXT3_INDEX if (EXT3_HAS_COMPAT_FEATURE(inode-i_sb, EXT3_FEATURE_COMPAT_DIR_INDEX) --- linux.orig/fs/ext3/inode.c +++ linux/fs/ext3/inode.c @@ -945,7 +945,7 @@ out: #define DIO_CREDITS (EXT3_RESERVE_TRANS_BLOCKS + 32) -static int ext3_get_block(struct inode *inode, sector_t iblock, +int ext3_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { handle_t *handle = journal_current_handle(); - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
On Tue, Jan 09, 2007 at 08:23:32AM -0800, Linus Torvalds wrote: On Tue, 9 Jan 2007, Fengguang Wu wrote: The fastest and probably most important thing to add is some readahead smarts to directories --- both to the htree and non-htree cases. If Here's is a quick hack to practice the directory readahead idea. Comments are welcome, it's a freshman's work :) Well, I'd probably have done it differently, but more important is whether this actually makes a difference performance-wise. Have you benchmarked it at all? Yes, a trivial test shows a marginal improvement, on a minimal debian system: # find / | wc -l 13641 # time find / /dev/null real0m10.000s user0m0.210s sys 0m4.370s # time find / /dev/null real0m9.890s user0m0.160s sys 0m3.270s Doing an echo 3 /proc/sys/vm/drop_caches is your friend for testing things like this, to force cold-cache behaviour.. Thanks, I'll work out numbers on large/concurrent dir accesses soon. Regards, Wu - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
Theodore Tso wrote: The fastest and probably most important thing to add is some readahead smarts to directories --- both to the htree and non-htree cases. If you're using some kind of b-tree structure, such as XFS does for directories, preallocation doesn't help you much. Delayed allocation can save you if your delayed allocator knows how to structure disk blocks so that a btree-traversal is efficient, but I'm guessing the biggest reason why we are losing is because we don't have sufficient readahead. This also has the advantage that it will help without needing to doing a backup/restore to improve layout. Something I just thought of: ATA and SCSI hard disks do their own read-ahead. Seeking all over the place to pick up bits of directory will hurt even more with the disk reading and throwing away data (albeit in its internal elevator and cache). Jeff - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
On Mon, Jan 08, 2007 at 02:41:47PM +0100, Johannes Stezenbach wrote: Would e2fsck -D help? What kind of optimization does it perform? It will help a little; e2fsck -D compresses the logical view of the directory, but it doesn't optimize the physical layout on disk at all, and of course, it won't help with the lack of readahead logic. It's possible to improve how e2fsck -D works, at the moment, it's not trying to make the directory be contiguous on disk. What it should probably do is to pull a list of all of the blocks used by the directory, sort them, and then try to see if it can improve on the list by allocating some new blocks that would make the directory more contiguous on disk. I suspect any improvements that would be seen by doing this would be second order effects at most, though. - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
Hi! Would e2fsck -D help? What kind of optimization does it perform? It will help a little; e2fsck -D compresses the logical view of the directory, but it doesn't optimize the physical layout on disk at all, and of course, it won't help with the lack of readahead logic. It's possible to improve how e2fsck -D works, at the moment, it's not trying to make the directory be contiguous on disk. What it should probably do is to pull a list of all of the blocks used by the directory, sort them, and then try to see if it can improve on the list by allocating some new blocks that would make the directory more contiguous on disk. I suspect any improvements that would be seen by doing this would be second order effects at most, though. ...sounds like a job for e2defrag, not e2fsck... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
On Mon, Jan 08, 2007 at 07:58:19AM -0500, Theodore Tso wrote: The fastest and probably most important thing to add is some readahead smarts to directories --- both to the htree and non-htree cases. If you're using some kind of b-tree structure, such as XFS does for directories, preallocation doesn't help you much. Delayed allocation can save you if your delayed allocator knows how to structure disk blocks so that a btree-traversal is efficient, but I'm guessing the biggest reason why we are losing is because we don't have sufficient readahead. This also has the advantage that it will help without needing to doing a backup/restore to improve layout. Would e2fsck -D help? What kind of optimization does it perform? Thanks, Johannes - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
On Mon, Jan 08, 2007 at 02:59:52PM +0100, Pavel Machek wrote: Hi! Would e2fsck -D help? What kind of optimization does it perform? It will help a little; e2fsck -D compresses the logical view of the directory, but it doesn't optimize the physical layout on disk at all, and of course, it won't help with the lack of readahead logic. It's possible to improve how e2fsck -D works, at the moment, it's not trying to make the directory be contiguous on disk. What it should probably do is to pull a list of all of the blocks used by the directory, sort them, and then try to see if it can improve on the list by allocating some new blocks that would make the directory more contiguous on disk. I suspect any improvements that would be seen by doing this would be second order effects at most, though. ...sounds like a job for e2defrag, not e2fsck... I wasn't proposing to move other data blocks around in order make the directory be contiguous, but just a quick and dirty try to make things better. But yes, in order to really fix layout issues you would have to do a full defrag, and it's probably more important that we try to fix things so that defragmentation runs aren't necessary in the first place - Ted - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
On Mon, Jan 08, 2007 at 05:09:34PM -0800, Paul Jackson wrote: Jeff wrote: Something I just thought of: ATA and SCSI hard disks do their own read-ahead. Probably this is wishful thinking on my part, but I would have hoped that most of the read-ahead they did was for stuff that happened to be on the cylinder they were reading anyway. So long as their read-ahead doesn't cause much extra or delayed disk head motion, what does it matter? And they usually won't readahead if there is another command to process, though they can be set up to read unrequested data in spite of outstanding commands. When they are reading ahead, they'll only fetch LBAs beyond the last request until a buffer fills or the readahead gets interrupted. jeremy - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
On Sun, 7 Jan 2007 09:55:26 +0100 Willy Tarreau [EMAIL PROTECTED] wrote: On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote: On Sat, 6 Jan 2007, H. Peter Anvin wrote: During extremely high load, it appears that what slows kernel.org down more than anything else is the time that each individual getdents() call takes. When I've looked this I've observed times from 200 ms to almost 2 seconds! Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly packed tree, you can do the math yourself. getdents() is totally serialized by the inode semaphore. It's one of the most expensive system calls in Linux, partly because of that, and partly because it has to call all the way down into the filesystem in a way that almost no other common system call has to (99% of all filesystem calls can be handled basically at the VFS layer with generic caches - but not getdents()). So if there are concurrent readdirs on the same directory, they get serialized. If there is any file creation/deletion activity in the directory, it serializes getdents(). To make matters worse, I don't think it has any read-ahead at all when you use hashed directory entries. So if you have cold-cache case, you'll read every single block totally individually, and serialized. One block at a time (I think the non-hashed case is likely also suspect, but that's a separate issue) In other words, I'm not at all surprised it hits on filldir time. Especially on ext3. At work, we had the same problem on a file server with ext3. We use rsync to make backups to a local IDE disk, and we noticed that getdents() took about the same time as Peter reports (0.2 to 2 seconds), especially in maildir directories. We tried many things to fix it with no result, including enabling dirindexes. Finally, we made a full backup, and switched over to XFS and the problem totally disappeared. So it seems that the filesystem matters a lot here when there are lots of entries in a directory, and that ext3 is not suitable for usages with thousands of entries in directories with millions of files on disk. I'm not certain it would be that easy to try other filesystems on kernel.org though :-/ Yeah, slowly-growing directories will get splattered all over the disk. Possible short-term fixes would be to just allocate up to (say) eight blocks when we grow a directory by one block. Or teach the directory-growth code to use ext3 reservations. Longer-term people are talking about things like on-disk rerservations. But I expect directories are being forgotten about in all of that. - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How git affects kernel.org performance
On Sun, Jan 07, 2007 at 01:15:42AM -0800, Andrew Morton wrote: On Sun, 7 Jan 2007 09:55:26 +0100 Willy Tarreau [EMAIL PROTECTED] wrote: On Sat, Jan 06, 2007 at 09:39:42PM -0800, Linus Torvalds wrote: On Sat, 6 Jan 2007, H. Peter Anvin wrote: During extremely high load, it appears that what slows kernel.org down more than anything else is the time that each individual getdents() call takes. When I've looked this I've observed times from 200 ms to almost 2 seconds! Since an unpacked *OR* unpruned git tree adds 256 directories to a cleanly packed tree, you can do the math yourself. getdents() is totally serialized by the inode semaphore. It's one of the most expensive system calls in Linux, partly because of that, and partly because it has to call all the way down into the filesystem in a way that almost no other common system call has to (99% of all filesystem calls can be handled basically at the VFS layer with generic caches - but not getdents()). So if there are concurrent readdirs on the same directory, they get serialized. If there is any file creation/deletion activity in the directory, it serializes getdents(). To make matters worse, I don't think it has any read-ahead at all when you use hashed directory entries. So if you have cold-cache case, you'll read every single block totally individually, and serialized. One block at a time (I think the non-hashed case is likely also suspect, but that's a separate issue) In other words, I'm not at all surprised it hits on filldir time. Especially on ext3. At work, we had the same problem on a file server with ext3. We use rsync to make backups to a local IDE disk, and we noticed that getdents() took about the same time as Peter reports (0.2 to 2 seconds), especially in maildir directories. We tried many things to fix it with no result, including enabling dirindexes. Finally, we made a full backup, and switched over to XFS and the problem totally disappeared. So it seems that the filesystem matters a lot here when there are lots of entries in a directory, and that ext3 is not suitable for usages with thousands of entries in directories with millions of files on disk. I'm not certain it would be that easy to try other filesystems on kernel.org though :-/ Yeah, slowly-growing directories will get splattered all over the disk. Possible short-term fixes would be to just allocate up to (say) eight blocks when we grow a directory by one block. Or teach the directory-growth code to use ext3 reservations. Longer-term people are talking about things like on-disk rerservations. But I expect directories are being forgotten about in all of that. By on-disk reservations, do you mean persistent file preallocation ? (that is explicit preallocation of blocks to a given file) If so, you are right, we haven't really given any thought to the possibility of directories needing that feature. Regards Suparna - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line unsubscribe linux-ext4 in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html