Re: ReiserFS v3 choking when free space falls below 10% - FIXED
Hello, I'm hope I'm doing this right here, The question is: Is or since when is the the patch which helped Mike Benoit integrated to the kernel source? Greetings bernd_b
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
Interesting that you sent this today. Just last night I really started to notice slow down on my MythTV box. It had been going on for a while, but I finally got fed up last night enough to look in to it, and what I found was this: 0 1324 4904 10184 6732800 1408 1060 1033 1393 17 5 5 73 0 1324 5204 10248 6692400 1284 3372 872 1324 16 5 23 57 0 1324 5288 10264 6674800 1280 1696 856 1361 13 2 23 62 1 1324 5076 10228 6701600 1032 2264 872 1399 16 2 46 37 0 3324 5160 10212 6713600 1428 2064 1007 1398 14 4 21 61 0 1324 4896 10272 6716800 1124 1712 867 1358 15 5 21 59 0 1324 5024 10256 6716000 1412 2328 1013 1379 16 4 21 59 0 1324 5100 10252 6701200 1024 1704 848 1378 14 4 40 43 0 1324 5584 10196 6655200 1284 1672 856 1360 13 4 28 55 0 1324 5608 10200 6681600 1168 2344 940 1489 17 4 27 52 0 0324 5880 10280 6640000 1288 3192 1092 1447 20 4 31 45 1 0324 5824 10160 6656400 1152 1996 997 1366 14 3 27 56 1 0324 5716 10144 6668000 1280 1616 855 1364 16 3 12 70 0 0324 6220 10084 6614000 1152 1960 991 1351 14 4 19 63 0 0324 6120 10076 6635600 1416 2184 1122 1556 19 4 48 29 Why is IO wait so high when its only reading/writing combined less then 4mb/sec? Before the patch there was a cut off point where the box just died, but now it seems like the box is just always slow. Could the patch be causing even more fragmentation then before, so while the corner case of virtually bringing the box to its knees is fixed, it is just always slow now? BTW: The vmstat output was with 25gb free, I've seen it happen with as much as 40gb free too. :( I really hope Reiser4 gets in to the kernel soon, hopefully allocate on flush greatly reduces the fragmentation caused by Myth. On Fri, 2006-08-18 at 16:41 +0200, Bernd Butscheidt wrote: Hello, I'm hope I'm doing this right here, The question is: Is or since when is the the patch which helped Mike Benoit integrated to the kernel source? Greetings bernd_b -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
Sorry, here is the vmstat output with column headers. procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0324 5976 10196 6635600 1722 1831 759 1496 89 4 1 7 0 1324 4904 10184 6732800 1408 1060 1033 1393 17 5 5 73 0 1324 5204 10248 6692400 1284 3372 872 1324 16 5 23 57 0 1324 5288 10264 6674800 1280 1696 856 1361 13 2 23 62 1 1324 5076 10228 6701600 1032 2264 872 1399 16 2 46 37 0 3324 5160 10212 6713600 1428 2064 1007 1398 14 4 21 61 0 1324 4896 10272 6716800 1124 1712 867 1358 15 5 21 59 0 1324 5024 10256 6716000 1412 2328 1013 1379 16 4 21 59 0 1324 5100 10252 6701200 1024 1704 848 1378 14 4 40 43 0 1324 5584 10196 6655200 1284 1672 856 1360 13 4 28 55 0 1324 5608 10200 6681600 1168 2344 940 1489 17 4 27 52 0 0324 5880 10280 6640000 1288 3192 1092 1447 20 4 31 45 1 0324 5824 10160 6656400 1152 1996 997 1366 14 3 27 56 1 0324 5716 10144 6668000 1280 1616 855 1364 16 3 12 70 0 0324 6220 10084 6614000 1152 1960 991 1351 14 4 19 63 0 0324 6120 10076 6635600 1416 2184 1122 1556 19 4 48 29 0 1324 5388 10172 6685600 1404 2260 1153 1418 14 6 21 59 0 1324 6288 10180 6600800 1412 2040 1012 1377 14 3 0 83 On Fri, 2006-08-18 at 16:41 +0200, Bernd Butscheidt wrote: Hello, I'm hope I'm doing this right here, The question is: Is or since when is the the patch which helped Mike Benoit integrated to the kernel source? Greetings bernd_b -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
David Masover wrote: As a future MythTV user a bit late to this discussion, I'm curious -- was this Reiser3 or 4? Are there any known MythTV issues with v4? I say this because the box with my capture card is running on a Reiser4 root right now... I think you get to be the one to tell us
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
Mike Benoit wrote: Thanks for all your hard work, I'm sure many other MythTV users will be appreciate it. As a future MythTV user a bit late to this discussion, I'm curious -- was this Reiser3 or 4? Are there any known MythTV issues with v4? I say this because the box with my capture card is running on a Reiser4 root right now... signature.asc Description: OpenPGP digital signature
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
On Tue, 2006-07-25 at 19:10 -0500, David Masover wrote: Mike Benoit wrote: Thanks for all your hard work, I'm sure many other MythTV users will be appreciate it. As a future MythTV user a bit late to this discussion, I'm curious -- was this Reiser3 or 4? Are there any known MythTV issues with v4? I say this because the box with my capture card is running on a Reiser4 root right now... It was Reiser3. I personally don't know of any issues with Reiser4 and MythTV, but if Reiser4 has pauses or hangs during flush that I have heard so much about, I could see that posing a problem to MythTV. -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
I applied the attached patch that Jeff supplied me and so far it is working flawlessly. I currently have less than 4% free space on my drive and the CPU usage is less then 3% with two recordings going. I'll let it run until about 2% free space just to test further. It also _appears_ that overall CPU usage is down slightly based on the vmstat output from when we were trying to diagnose the problem before compared to now. The SYS CPU time was hovering between 3-10% before, and now it seems to be between 0-2%. I haven't done any actual performance tests though. Jeff, what drawbacks does this patch have? Thanks for all your hard work, I'm sure many other MythTV users will be appreciate it. On Thu, 2006-06-29 at 10:41 -0700, Mike Benoit wrote: My MythTV box recently started showing odd behavior during recordings, at certain times the load of the box would spike to 10+ and recordings would start losing frames and become unwatchable. TOP would show mythbackend as using 90+% SYS CPU usage, which under normal circumstances it normally uses about 5% USR. So I finally got around to profiling mythbackend when the load starts to spike. To my surprise it appears that once I have less then 10% (30GB) free on the drive reiserfs can't up, even just writing at 1mb/sec is too much for it. Is there something that can be done to fix this, 30gb seems like a lot of wasted space. #opreport CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| -- 77863 78.7856 reiserfs 18183 18.3984 vmlinux 695 0.7032 mysqld 452 0.4574 libc-2.4.so 360 0.3643 libmythtv-0.19.so.0.19.0 324 0.3278 ivtv 323 0.3268 nvidia 242 0.2449 libqt-mt.so.3.3.6 110 0.1113 libpthread-2.4.so 53 0.0536 libstdc++.so.6.0.8 35 0.0354 ld-2.4.so 23 0.0233 libperl.so 22 0.0223 libz.so.1.2.3 snip #opreport -l /usr/src/linux/vmlinux CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %symbol name 9607 52.8351 default_idle 7694 42.3142 find_next_zero_bit 183 1.0064 __copy_from_user_ll 570.3135 handle_IRQ_event 370.2035 __copy_to_user_ll 340.1870 ide_outb 300.1650 ide_end_request 220.1210 ioread8 220.1210 schedule 210.1155 get_page_from_freelist 170.0935 mmx_clear_page snip System Details: --- Kernel v2.6.16.21 (custom compiled) - This issue also happened with 2.6.14 too though. FilesystemSize Used Avail Use% Mounted on /dev/hda1 280G 269G 12G 97% / [EMAIL PROTECTED] cat /proc/mounts rootfs / rootfs rw 0 0 /dev /dev tmpfs rw 0 0 /dev/root / reiserfs rw,noatime,nodiratime 0 0 [EMAIL PROTECTED] cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 6 model name : AMD Athlon(tm) XP 2100+ stepping: 2 cpu MHz : 1759.680 cache size : 256 KB [EMAIL PROTECTED] free total used free sharedbuffers cached Mem:515992 496256 19736 0 36256 271728 -/+ buffers/cache: 188272 327720 Swap: 262136408 261728 [EMAIL PROTECTED] ~]# hdparm -i /dev/hda /dev/hda: Model=ST3300622A, FwRev=3.AND, SerialNo=3NF1GAGW Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs RotSpdTol.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=16 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: Unspecified: ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7 * signifies the current active mode [EMAIL PROTECTED] ~]# hdparm -tT /dev/hda /dev/hda: Timing cached reads: 1296 MB in 2.00 seconds = 646.99 MB/sec Timing buffered disk reads: 166 MB in 3.02 seconds = 55.05 MB/sec vmstat 1 output: -- procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 8 0408 5800 29308 24860400 0 1036 406 132 2 98 0 0 4 0408 5644 29396 24860800 0 1128 437 184 2 92 0 6 7 0408 6316 29428 24802000 0 1316 539 287 0 86 0 14 5 0408 6104 29480 24818000 0 588 415 187 0 99 0 1 4 0408 5764 29536 24836400 0 1092 421
Re: ReiserFS v3 choking when free space falls below 10% - FIXED
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Benoit wrote: I applied the attached patch that Jeff supplied me and so far it is working flawlessly. I currently have less than 4% free space on my drive and the CPU usage is less then 3% with two recordings going. I'll let it run until about 2% free space just to test further. It also _appears_ that overall CPU usage is down slightly based on the vmstat output from when we were trying to diagnose the problem before compared to now. The SYS CPU time was hovering between 3-10% before, and now it seems to be between 0-2%. I haven't done any actual performance tests though. Jeff, what drawbacks does this patch have? Thanks for all your hard work, I'm sure many other MythTV users will be appreciate it. Hi Mike - There really shouldn't be any. I suspect that the window searching was actually causing more problems than it was solving. The original goal would have been to try to keep chunks of blocks contiguous for better access patterns, but if those chunks end up getting spread out all over the disk, that's hardly the outcome we were looking for. So, what will now happen is that the allocator will allocate the next n blocks it can find, regardless of the window size. If there happens to be a window of the size we needed, it will automatically find it through the normal process of allocating one block at a time. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFExUqOLPWxlyuTD7IRAuaFAJ47W+zr2ZwIs//vMgm3RNHuw4dpwACdECdv ueI91PGuCLQdeKipY5G9kqk= =vk6Z -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
Jeffrey Mahoney wrote: Hans Reiser wrote: You make this way too complicated because you are trying to be way too perfect. If you scan 3 bitmap blocks and find nothing, stop trying to size match. Agreed on the trying too hard.. What about the actual algorithm suggested? I think we can find a better, less perfect solution. I wrote that email on Friday on my notebook and it couldn't connect. It managed to do so this evening. I spent the weekend experimenting with the idea, and while I came up with something that worked, it wasn't really usable. The memory footprint was much too large to be worthwhile. For some fragmentation patterns, it would work. The worst case scenario was totally intolerable. -Jeff -- Jeff Mahoney SUSE Labs
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jeff Mahoney wrote: Hans Reiser wrote: Guys, if you run the kernel under a debugger, and get it to where you see the excessive CPU usage, and then start stepping through the bitmap code, I am sure it will be very obvious what the error is. Can anyone do that for us? Jeff? Apologies to everyone CC'd who've already seen this message. It was bounced from the namesys servers and I wanted to preserve the CC list. *** Mike sent me a copy of the metadata and I am now able to reproduce locally. My profiling looks like this: samples % image name app namesymbol name 148596 17.8573 reiserfs.ko reiserfsreiserfs_in_journal 58194 6.9934 reiserfs.ko reiserfssearch_by_key 38937 4.6792 vmlinux vmlinux memmove 38783 4.6607 reiserfs.ko reiserfsscan_bitmap_block 38466 4.6226 jbd jbd (no symbols) 23249 2.7939 vmlinux vmlinux __find_get_block 18196 2.1867 vmlinux vmlinux tty_write 17734 2.1312 vmlinux vmlinux do_ioctl 17293 2.0782 looploop(no symbols) 15400 1.8507 vmlinux vmlinux cond_resched_lock 14836 1.7829 vmlinux vmlinux copy_user_generic_c 14143 1.6996 reiserfs.ko reiserfsdo_journal_end 13638 1.6389 vmlinux vmlinux find_next_zero_bit 13236 1.5906 vmlinux vmlinux default_llseek 12925 1.5532 vmlinux vmlinux bit_waitqueue 89211.0721 vmlinux vmlinux __delay Hans - My speculation about the bitmaps being fragmented was right on. I wrote a quick little script to parse the output of debugreiserfs -m and report on the frequency of different window sizes. Windows of 1-31 blocks are extremely common, accounting for 99.8% of all free windows. The problem is that in my testing, where I made the allocator report the size of allocation requests, the most common request was for a window of 32 blocks. What's happening is that we keep finding windows that are too small, which results in a lot of wasted effort. The cycle goes like this: if (unfm is_block_in_journal(s, bmap_n, *beg, beg)) continue; /* first zero bit found; we check next bits */ for (end = *beg + 1;; end++) { if (end = *beg + max || end = boundary || reiserfs_test_le_bit(end, bi-bh-b_data)) { next = end; break; } /* finding the other end of zero bit window requires * looking into journal structures (in * case of searching for free blocks for unformatted nodes) */ if (unfm is_block_in_journal(s, bmap_n, end, next)) break; } If the window is too small, we end up looping up to the top and try to find another one. Since the overwhelming majority of the windows are too small, we go through just about all the bitmaps without backing off the window size. To be clear, eventually the allocations are honored, but only after *all* of the bitmaps are searched. On the third pass, we drop the window to a single block and restart the scan, eventually building a 32-block set that is probably quite fragmented. This occurs on every write, hence the huge performance hit. It appears as though ext3 doesn't have this problem because they don't batch writes the way reiserfs does. They'll start a search at a decent hint the same way we do, but the window is always one block. So, we're stuck between a rock and a hard place. We can have the better allocation performance at lower usage and sacrifice performance later or we can have stable allocation performance at an overall reduction in performance. I have an idea that may get around both problems, but I'm not sure how well it would be received. We currently do some very basic caching of bitmap metadata such as the first zero bit and how many free blocks there are. What if we constructed an extent map of the free windows in each bitmaps when we cache the metadata and adjust the map when we There's a third option, but I'm not sure how well it would be received Right now, the allocator keeps track of things like how full a bitmap is and where the first zero bit is. It would also be possible to cache a list of windows in each bitmap to accelerate performance. This would have to be a shrinkable cache, since the pathlogical case could mean occupying - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEtEhVLPWxlyuTD7IRAnt8AJ4qnp+578/oqKbyLbXJJoFewfOuSwCcDJJN izEeprRI0kSOmTZ860sVYOY= =xUpP -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
You make this way too complicated because you are trying to be way too perfect. If you scan 3 bitmap blocks and find nothing, stop trying to size match. Hans Jeffrey Mahoney wrote: Jeff Mahoney wrote: Hans Reiser wrote: Guys, if you run the kernel under a debugger, and get it to where you see the excessive CPU usage, and then start stepping through the bitmap code, I am sure it will be very obvious what the error is. Can anyone do that for us? Jeff? Apologies to everyone CC'd who've already seen this message. It was bounced from the namesys servers and I wanted to preserve the CC list. *** Mike sent me a copy of the metadata and I am now able to reproduce locally. My profiling looks like this: samples % image name app namesymbol name 148596 17.8573 reiserfs.ko reiserfsreiserfs_in_journal 58194 6.9934 reiserfs.ko reiserfssearch_by_key 38937 4.6792 vmlinux vmlinux memmove 38783 4.6607 reiserfs.ko reiserfsscan_bitmap_block 38466 4.6226 jbd jbd (no symbols) 23249 2.7939 vmlinux vmlinux __find_get_block 18196 2.1867 vmlinux vmlinux tty_write 17734 2.1312 vmlinux vmlinux do_ioctl 17293 2.0782 looploop(no symbols) 15400 1.8507 vmlinux vmlinux cond_resched_lock 14836 1.7829 vmlinux vmlinux copy_user_generic_c 14143 1.6996 reiserfs.ko reiserfsdo_journal_end 13638 1.6389 vmlinux vmlinux find_next_zero_bit 13236 1.5906 vmlinux vmlinux default_llseek 12925 1.5532 vmlinux vmlinux bit_waitqueue 89211.0721 vmlinux vmlinux __delay Hans - My speculation about the bitmaps being fragmented was right on. I wrote a quick little script to parse the output of debugreiserfs -m and report on the frequency of different window sizes. Windows of 1-31 blocks are extremely common, accounting for 99.8% of all free windows. The problem is that in my testing, where I made the allocator report the size of allocation requests, the most common request was for a window of 32 blocks. What's happening is that we keep finding windows that are too small, which results in a lot of wasted effort. The cycle goes like this: if (unfm is_block_in_journal(s, bmap_n, *beg, beg)) continue; /* first zero bit found; we check next bits */ for (end = *beg + 1;; end++) { if (end = *beg + max || end = boundary || reiserfs_test_le_bit(end, bi-bh-b_data)) { next = end; break; } /* finding the other end of zero bit window requires * looking into journal structures (in * case of searching for free blocks for unformatted nodes) */ if (unfm is_block_in_journal(s, bmap_n, end, next)) break; } If the window is too small, we end up looping up to the top and try to find another one. Since the overwhelming majority of the windows are too small, we go through just about all the bitmaps without backing off the window size. To be clear, eventually the allocations are honored, but only after *all* of the bitmaps are searched. On the third pass, we drop the window to a single block and restart the scan, eventually building a 32-block set that is probably quite fragmented. This occurs on every write, hence the huge performance hit. It appears as though ext3 doesn't have this problem because they don't batch writes the way reiserfs does. They'll start a search at a decent hint the same way we do, but the window is always one block. So, we're stuck between a rock and a hard place. We can have the better allocation performance at lower usage and sacrifice performance later or we can have stable allocation performance at an overall reduction in performance. I have an idea that may get around both problems, but I'm not sure how well it would be received. We currently do some very basic caching of bitmap metadata such as the first zero bit and how many free blocks there are. What if we constructed an extent map of the free windows in each bitmaps when we cache the metadata and adjust the map when we There's a third option, but I'm not sure how well it would be received Right now, the allocator keeps track of things like how full a bitmap is and where the first zero bit is. It would also be possible to cache a list of windows in each bitmap to accelerate performance. This would have to be a shrinkable cache, since the pathlogical case could mean occupying -Jeff -- Jeff Mahoney SUSE Labs
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hans Reiser wrote: You make this way too complicated because you are trying to be way too perfect. If you scan 3 bitmap blocks and find nothing, stop trying to size match. Agreed on the trying too hard. I think we can find a better, less perfect solution. I wrote that email on Friday on my notebook and it couldn't connect. It managed to do so this evening. I spent the weekend experimenting with the idea, and while I came up with something that worked, it wasn't really usable. The memory footprint was much too large to be worthwhile. For some fragmentation patterns, it would work. The worst case scenario was totally intolerable. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEtI4xLPWxlyuTD7IRAh0uAJ4xqU2JFRUqgyQYDDQBr0oGBJBCXgCcCXD7 et36eQ8yUt3CD7e6+thPZvU= =iFAe -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hans Reiser wrote: Guys, if you run the kernel under a debugger, and get it to where you see the excessive CPU usage, and then start stepping through the bitmap code, I am sure it will be very obvious what the error is. Can anyone do that for us? Jeff? Apologies to everyone CC'd who've already seen this message. It was bounced from the namesys servers and I wanted to preserve the CC list. *** Mike sent me a copy of the metadata and I am now able to reproduce locally. My profiling looks like this: samples % image name app namesymbol name 148596 17.8573 reiserfs.ko reiserfsreiserfs_in_journal 58194 6.9934 reiserfs.ko reiserfssearch_by_key 38937 4.6792 vmlinux vmlinux memmove 38783 4.6607 reiserfs.ko reiserfsscan_bitmap_block 38466 4.6226 jbd jbd (no symbols) 23249 2.7939 vmlinux vmlinux __find_get_block 18196 2.1867 vmlinux vmlinux tty_write 17734 2.1312 vmlinux vmlinux do_ioctl 17293 2.0782 looploop(no symbols) 15400 1.8507 vmlinux vmlinux cond_resched_lock 14836 1.7829 vmlinux vmlinux copy_user_generic_c 14143 1.6996 reiserfs.ko reiserfsdo_journal_end 13638 1.6389 vmlinux vmlinux find_next_zero_bit 13236 1.5906 vmlinux vmlinux default_llseek 12925 1.5532 vmlinux vmlinux bit_waitqueue 89211.0721 vmlinux vmlinux __delay Hans - My speculation about the bitmaps being fragmented was right on. I wrote a quick little script to parse the output of debugreiserfs -m and report on the frequency of different window sizes. Windows of 1-31 blocks are extremely common, accounting for 99.8% of all free windows. The problem is that in my testing, where I made the allocator report the size of allocation requests, the most common request was for a window of 32 blocks. What's happening is that we keep finding windows that are too small, which results in a lot of wasted effort. The cycle goes like this: if (unfm is_block_in_journal(s, bmap_n, *beg, beg)) continue; /* first zero bit found; we check next bits */ for (end = *beg + 1;; end++) { if (end = *beg + max || end = boundary || reiserfs_test_le_bit(end, bi-bh-b_data)) { next = end; break; } /* finding the other end of zero bit window requires * looking into journal structures (in * case of searching for free blocks for unformatted nodes) */ if (unfm is_block_in_journal(s, bmap_n, end, next)) break; } If the window is too small, we end up looping up to the top and try to find another one. Since the overwhelming majority of the windows are too small, we go through just about all the bitmaps without backing off the window size. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFEr/1QLPWxlyuTD7IRApS7AJ9FgnAIGagxeWLDxpiixZt3bW7RmQCgoYwS +ycgwRw+I6mVATMNTeuLPQ8= =67kl -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
By 8 iterations, I mean 8 bitmaps scanned.
Re: ReiserFS v3 choking when free space falls below 10%?
So limit the number of iterations of rejecting windows that are too small. Say, 8. Hans Jeff Mahoney wrote: What's happening is that we keep finding windows that are too small, which results in a lot of wasted effort. The cycle goes like this: if (unfm is_block_in_journal(s, bmap_n, *beg, beg)) continue; /* first zero bit found; we check next bits */ for (end = *beg + 1;; end++) { if (end = *beg + max || end = boundary || reiserfs_test_le_bit(end, bi-bh-b_data)) { next = end; break; } /* finding the other end of zero bit window requires * looking into journal structures (in * case of searching for free blocks for unformatted nodes) */ if (unfm is_block_in_journal(s, bmap_n, end, next)) break; } If the window is too small, we end up looping up to the top and try to find another one. Since the overwhelming majority of the windows are too small, we go through just about all the bitmaps without backing off the window size. -Jeff -- Jeff Mahoney SUSE Labs
Re: ReiserFS v3 choking when free space falls below 10%?
Hi Jeff, Like clock work the problem showed up pretty much exactly when I expected. This time however I discovered a few other interesting tidbits along the way. I tried to re-create the problem much faster by writing a little script that would append data to a file at about 5mb/s, I ran two instances of this script simultaneously, so each script was writing to two separate files at the same time. After the files reached 2gb (close to the same size of a recording) both scripts would start writing to new files. Two recordings were also going on at the same time as all of this. I filled the drive up so about 10gb (5%) was free, and although the write speed dropped off significantly as the drive filled up, the SYS CPU time never increased. So this method obviously failed to re-create the problem, so I deleted all these files the script created (~60gb worth) and let MythTV do its thing until the drive filled up that way. The pattern MythTV writes data out must have something to do with this? The other interesting thing that happened was at about 11pm tonight I noticed the problem started occurring, (SYS CPU was high) but MythTV was transcoding MPEG2 recordings to MPEG4 which was using USR CPU time, so I tried to stop this process so the oprofile data wouldn't be so cluttered. Well when I stopped the transcoding, MythTV deleted the temp file it had been creating and this caused the problem stop almost immediately! I didn't realize MythTV deleted this temp file at first, so I was disappointed and therefore started the transcoding again in hopes of re-creating the problem. Within about 45mins, it started happening again, but this time I renamed the temp file (so it wouldn't get deleted) MythTV was trancoding to before I killed that process, and was able to keep the problem happening long enough to get a clean oprofile and better vmstat data. The really interesting thing is how much free space was available each time the problem hit: Time #1: Thu Jul 6 22:53:26 PDT 2006 /dev/hda1293024652 269089456 23935196 92% / Time #2 (45mins later): Thu Jul 6 23:35:38 PDT 2006 /dev/hda1293024652 269227580 23797072 92% / It seems like once the free space hits a very specific point, the problem is triggered. As you will notice in the vmstat logs, within about 30seconds the SYS CPU time goes from 4% to 75+% and hovers there. Attached are the vmstat logs and oprofile report and I'm sending you the output of debugreiserfs -p /dev/hda1 when the drive is 92% full privately (when it finishes, could be morning). Just so you know, I had vmstat set to output 6 times every 10 seconds, then I ran date/df, rinse and repeat. This is the script I used to collect the data: { while [ 1 != 0 ] ; do date df vmstat 10 6 done } 2/tmp/monitor.log 1/tmp/monitor.log vmstat_1.txt is the first time the problem occurred. vmstat_2.txt is the second time the problem occurred. Hopefully this helps you track down the issue. If not, let me know if you want me to collect more data. I'll try to keep the drive as full as possible so I can re-create the problem much faster. Also if you need access to the box, that can be arranged. Thanks. PS. I'm running kernel v2.6.16.21-rfsfix, rfsfix is the following patch you sent me. I experienced the problem on kernels as old as 2.6.14. diff -ruNpX ../dontdiff linux-2.6.15.orig.staging1/fs/reiserfs/bitmap.c linux-2.6.15.orig.staging2/fs/reiserfs/bitmap.c --- linux-2.6.15.orig.staging1/fs/reiserfs/bitmap.c 2006-01-16 16:53:35.663319136 -0500 +++ linux-2.6.15.orig.staging2/fs/reiserfs/bitmap.c 2006-01-16 16:53:35.673317616 -0500 @@ -187,7 +187,10 @@ static int scan_bitmap_block(struct reis return 0; // No free blocks in this bitmap } - /* search for a first zero bit -- beggining of a window */ + if (*beg bi-first_zero_hint) + *beg = bi-first_zero_hint; + + /* search for a first zero bit -- beginning of a window */ *beg = reiserfs_find_next_zero_le_bit ((unsigned long *)(bh-b_data), boundary, *beg); On Thu, 2006-07-06 at 14:39 -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Benoit wrote: On Thu, 2006-07-06 at 14:02 -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Benoit wrote: My desktop machine (v2.6.16, same as my MythTV box) is running with 9% free space right now and it is not experiencing any slow down. I think the problem is caused by the usage pattern of MythTV and how it simultaneously streams one or more large files to the HD in relatively small chunks over a long period of time. Ok, if you run into the problem again, can you dump the metadata before freeing the space? The code itself looks sound, and I'm wondering if you've managed to create pathological fragmentation that's mucking things
Re: ReiserFS v3 choking when free space falls below 10%?
Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { So the comment suggests we should lift the restriction when we are 80% full but if you see the condition, it checks wherher we are 95% full! I guess that is really asking for trouble and could explain the behaviour... Mike could you try changing that 20 in the test to 5? IMHO that could fix your problem. Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jan Kara wrote: Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { So the comment suggests we should lift the restriction when we are 80% full but if you see the condition, it checks wherher we are 95% full! I guess that is really asking for trouble and could explain the behaviour... Mike could you try changing that 20 in the test to 5? IMHO that could fix your problem. Shoot. I guess I never sent that mail out last night. I had discovered the same thing. The thing is, I don't think it will cause the kind of performance problem we're seeing here. Once it sees the 90% check it will bail out. Minor slowdown, not anything like we're seeing. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFErp76LPWxlyuTD7IRAqJ1AJ9ce8HTFNauhcriJzUlKJ1p68u4MwCdE4W/ IA09T6t/46TD+PSAQs/MHkk= =/9Xa -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jan Kara wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jan Kara wrote: Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { So the comment suggests we should lift the restriction when we are 80% full but if you see the condition, it checks wherher we are 95% full! I guess that is really asking for trouble and could explain the behaviour... Mike could you try changing that 20 in the test to 5? IMHO that could fix your problem. Shoot. I guess I never sent that mail out last night. I had discovered the same thing. The thing is, I don't think it will cause the kind of performance problem we're seeing here. Once it sees the 90% check it will bail out. Minor slowdown, not anything like we're seeing. Hmm, right. You'll only scan that one bitmap the file is in, won't you? That can still take some time so maybe it's worth trying this fix anyway. Oh, I agree that it's a bug that needs to be fixed. I just don't think it's causing 90% CPU usage. :) - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFErqMELPWxlyuTD7IRAnitAJ9rbkY8sKzJqqVZnwA1Gqo2aEcV1QCgqBgt YsXQ7d6S/70du/bWQ28Xhkc= =Jv9h -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
Jan Kara wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jan Kara wrote: Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { How about eliminating this feature entirely. It seems rather dubious. So the comment suggests we should lift the restriction when we are 80% full but if you see the condition, it checks wherher we are 95% full! I guess that is really asking for trouble and could explain the behaviour... Mike could you try changing that 20 in the test to 5? IMHO that could fix your problem. Shoot. I guess I never sent that mail out last night. I had discovered the same thing. The thing is, I don't think it will cause the kind of performance problem we're seeing here. Once it sees the 90% check it will bail out. Minor slowdown, not anything like we're seeing. Hmm, right. You'll only scan that one bitmap the file is in, won't I don't understand your remark. These files are in many many bitmaps Can you quote more of the code? you? That can still take some time so maybe it's worth trying this fix anyway. Honza
Re: ReiserFS v3 choking when free space falls below 10%?
Jan Kara wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jan Kara wrote: Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { How about eliminating this feature entirely. It seems rather dubious. Yes, but it may help reducing fragmentation as it leaves some free space in bitmaps for the files already ending in that bitmaps. I'm not sure if it really helps through... So the comment suggests we should lift the restriction when we are 80% full but if you see the condition, it checks wherher we are 95% full! I guess that is really asking for trouble and could explain the behaviour... Mike could you try changing that 20 in the test to 5? IMHO that could fix your problem. Shoot. I guess I never sent that mail out last night. I had discovered the same thing. The thing is, I don't think it will cause the kind of performance problem we're seeing here. Once it sees the 90% check it will bail out. Minor slowdown, not anything like we're seeing. Hmm, right. You'll only scan that one bitmap the file is in, won't I don't understand your remark. These files are in many many bitmaps Can you quote more of the code? The condition really is: if ((off (!unfm || (file_block != 0))) || SB_AP_BITMAP(s)[bm].free_count (s-s_blocksize 3) / 10) and we reset 'off' after the first test so the first part of || can be true only once (when we are scanning the bitmap containing the last file block). Honza -- Jan Kara [EMAIL PROTECTED] SuSE CR Labs
Re: ReiserFS v3 choking when free space falls below 10%?
Jan Kara wrote: Jan Kara wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jan Kara wrote: Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { How about eliminating this feature entirely. It seems rather dubious. Yes, but it may help reducing fragmentation as it leaves some free space in bitmaps for the files already ending in that bitmaps. I'm not sure if it really helps through... I think I was wrong, and retract my remark.
Re: ReiserFS v3 choking when free space falls below 10%?
On Fri, 2006-07-07 at 19:49 +0200, Jan Kara wrote: Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { So the comment suggests we should lift the restriction when we are 80% full but if you see the condition, it checks wherher we are 95% full! I guess that is really asking for trouble and could explain the behaviour... Mike could you try changing that 20 in the test to 5? IMHO that could fix your problem. I've recompiled my kernel with this suggested change, and so far I have surpassed (just barely) the free space trigger point that occurred twice yesterday. I'll keep the recordings going so I can give you guys more conclusive results in a couple hours. -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10%?
On Fri, 2006-07-07 at 19:49 +0200, Jan Kara wrote: Hi, just one note: I've looked to the in scan_bitmap() in bitmap.c. There is: /* When the bitmap is more than 10% free, anyone can allocate. * When it's less than 10% free, only files that already use the * bitmap are allowed. Once we pass 80% full, this restriction * is lifted. * * We do this so that files that grow later still have space * close to * their original allocation. This improves locality, and * presumably * performance as a result. * * This is only an allocation policy and does not make up for * getting a * bad hint. Decent hinting must be implemented for this to work * well. */ if (TEST_OPTION(skip_busy, s) SB_FREE_BLOCKS(s) SB_BLOCK_COUNT(s) / 20) { So the comment suggests we should lift the restriction when we are 80% full but if you see the condition, it checks wherher we are 95% full! I guess that is really asking for trouble and could explain the behaviour... Mike could you try changing that 20 in the test to 5? IMHO that could fix your problem. It looks like it lasted a little longer, but probably not enough to determine that this change made the difference or not. /dev/hda1293024652 271457512 21567140 93% / Attached is the vmstat output of the problem occurring. [EMAIL PROTECTED] tmp]# opreport -l /usr/src/linux/vmlinux | head -n20 CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %symbol name 3945 53.9082 default_idle 3031 41.4184 find_next_zero_bit 500.6832 __copy_from_user_ll 300.4099 handle_IRQ_event 160.2186 ide_outb 160.2186 ioread8 100.1366 ide_end_request 100.1366 mmx_clear_page 100.1366 number 9 0.1230 __copy_to_user_ll 7 0.0957 get_page_from_freelist 5 0.0683 __find_get_block 5 0.0683 __link_path_walk 5 0.0683 kmem_cache_alloc 5 0.0683 mmx_copy_page 5 0.0683 sysenter_past_esp 4 0.0547 __make_request -- Mike Benoit [EMAIL PROTECTED] Fri Jul 7 13:48:49 PDT 2006 Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1293024652 271457512 21567140 93% / procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0572 6416 13472 10871200 836 2086 520 966 57 3 31 9 0 0572 6112 13444 10899600 1 1830 461 468 0 5 60 35 0 0572 6788 13404 10836800 2 1916 477 484 1 7 61 31 0 0572 5760 13424 10928800 1 1884 461 470 0 5 58 37 0 0572 6104 13536 10888400 1 1950 462 462 0 10 55 34 0 1572 6652 13468 10846000 2 1901 478 483 1 9 57 34 Fri Jul 7 13:49:39 PDT 2006 Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1293024652 271546292 21478360 93% / procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0572 6636 13480 10849200 831 2085 520 962 57 3 31 9 0 1572 6012 13692 10891200 8 1900 470 476 0 5 60 35 0 0572 6148 13584 10879200 1 1844 464 468 0 8 58 33 0 1572 5740 13832 10898000 2 1949 479 506 1 1 71 27 0 1572 6140 13632 10871200 1 1828 467 481 0 2 60 38 0 0572 5984 13532 10907200 2 2012 466 481 0 5 62 34 Fri Jul 7 13:50:29 PDT 2006 Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1293024652 271636228 21388424 93% / procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 2572 6172 13576 10873600 825 2084 520 959 56 3 31 9 3 1572 6288 13664 10858400 2 1844 484 481 1 9 56 34 3 0572 6352 13548 10858000 2 1568 435 372 0 32 46 21 6 0572 6092 13808 10860000 0 560 360 161 2 94 0 3 2 2572 5648 13968 10881200 2 626 399 149 1 88 2 9 6 1572 6344 14096 10809200 1 464 34362 0 99 0 0 Fri Jul 7 13:51:20 PDT 2006 Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1293024652 271682836 21341816 93% / procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 7 0572 6328 14096 10810000 820 2077
Re: ReiserFS v3 choking when free space falls below 10%?
Guys, if you run the kernel under a debugger, and get it to where you see the excessive CPU usage, and then start stepping through the bitmap code, I am sure it will be very obvious what the error is. Can anyone do that for us? Jeff?
Re: ReiserFS v3 choking when free space falls below 10%?
On Tue, 04 Jul 2006 19:37:34 -0700 Hans Reiser [EMAIL PROTECTED] wrote: Mike Benoit wrote: Hi Jeff, I just tried the patch you suggested and it didn't make a difference. The load still spikes as soon as the free space falls below ~10%. Jeff, please audit your code for what happens when all the bitmap blocks reach 90% full. Could you discuss your design and code in that regard for our benefit? Mike, thanks so much for going to this much effort. It is rather likely this is a problem affecting many users. I run my busy mailservers with 0.5-2% free space (that's still a couple of gigabytes) and have no problems. It's true that I haven't touched the kernel reiserfs there (2.4.21), so it does not have any additions to the reiserfs v3 code since then. It just works, so I don't have any desire to fix it :) -- Jure Pečar http://jure.pecar.org
Re: ReiserFS v3 choking when free space falls below 10%?
On Thu, 2006-07-06 at 12:58 +0200, Jure Pečar wrote: On Tue, 04 Jul 2006 19:37:34 -0700 Hans Reiser [EMAIL PROTECTED] wrote: Mike Benoit wrote: Hi Jeff, I just tried the patch you suggested and it didn't make a difference. The load still spikes as soon as the free space falls below ~10%. Jeff, please audit your code for what happens when all the bitmap blocks reach 90% full. Could you discuss your design and code in that regard for our benefit? Mike, thanks so much for going to this much effort. It is rather likely this is a problem affecting many users. I run my busy mailservers with 0.5-2% free space (that's still a couple of gigabytes) and have no problems. It's true that I haven't touched the kernel reiserfs there (2.4.21), so it does not have any additions to the reiserfs v3 code since then. It just works, so I don't have any desire to fix it :) My desktop machine (v2.6.16, same as my MythTV box) is running with 9% free space right now and it is not experiencing any slow down. I think the problem is caused by the usage pattern of MythTV and how it simultaneously streams one or more large files to the HD in relatively small chunks over a long period of time. -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10%?
On Thu, 2006-07-06 at 08:43 -0700, Mike Benoit wrote: [snip] My desktop machine (v2.6.16, same as my MythTV box) is running with 9% free space right now and it is not experiencing any slow down. I think the problem is caused by the usage pattern of MythTV and how it simultaneously streams one or more large files to the HD in relatively small chunks over a long period of time. Hasn't someone patched MythTV to pre-allocate (zero-write) the video files to the expected sizes? I was sure I'd read about that somewhere... -- Jonathan Briggs [EMAIL PROTECTED] eSoft, Inc. signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10%?
On 6-Jul-06, at 11:43 AM, Mike Benoit wrote: On Thu, 2006-07-06 at 12:58 +0200, Jure Pečar wrote: On Tue, 04 Jul 2006 19:37:34 -0700 Hans Reiser [EMAIL PROTECTED] wrote: Mike Benoit wrote: Hi Jeff, I just tried the patch you suggested and it didn't make a difference. The load still spikes as soon as the free space falls below ~10%. Jeff, please audit your code for what happens when all the bitmap blocks reach 90% full. Could you discuss your design and code in that regard for our benefit? Mike, thanks so much for going to this much effort. It is rather likely this is a problem affecting many users. I run my busy mailservers with 0.5-2% free space (that's still a couple of gigabytes) and have no problems. It's true that I haven't touched the kernel reiserfs there (2.4.21), so it does not have any additions to the reiserfs v3 code since then. It just works, so I don't have any desire to fix it :) My desktop machine (v2.6.16, same as my MythTV box) is running with 9% free space right now and it is not experiencing any slow down. I think the problem is caused by the usage pattern of MythTV and how it simultaneously streams one or more large files to the HD in relatively small chunks over a long period of time. ...And then has a hard timing requirement when reusing the free space, which a desktop/server doesn't have, exposing the issue. --T -- Mike Benoit [EMAIL PROTECTED]
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Benoit wrote: My desktop machine (v2.6.16, same as my MythTV box) is running with 9% free space right now and it is not experiencing any slow down. I think the problem is caused by the usage pattern of MythTV and how it simultaneously streams one or more large files to the HD in relatively small chunks over a long period of time. Ok, if you run into the problem again, can you dump the metadata before freeing the space? The code itself looks sound, and I'm wondering if you've managed to create pathological fragmentation that's mucking things up. Being able to see the fs metadata would confirm or disprove that theory, and help in fixing it. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFErVAgLPWxlyuTD7IRAnXvAJ9gpOT9PR0ndGhmtDOgKsEtcuZB6wCfRkYR WMPwT7Tn8hW/Y/HFs8g6TrU= =2lCS -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
Jeff Mahoney wrote: Ok, if you run into the problem again, can you dump the metadata before freeing the space? The code itself looks sound, and I'm wondering if you've managed to create pathological fragmentation that's mucking things up. There should be no possible fragmentation that would increase CPU usage like that. With the current algorithms, in which you check one field in the bitmap to see if it has any free blocks, it should not be possible for scanning bitmaps to take so much time.. There must be a bug in there.
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hans Reiser wrote: There should be no possible fragmentation that would increase CPU usage like that. With the current algorithms, in which you check one field in the bitmap to see if it has any free blocks, it should not be possible for scanning bitmaps to take so much time.. There must be a bug in there. I'm sure there is, but it's a bug that others don't seem to be seeing, including myself, and Mike reported he's not experiencing the problem anymore. Given my current workload, unless I can readily reproduce it locally, this takes a low priority for me. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFErVQwLPWxlyuTD7IRArvSAJ9pXBTGPzJjHYXQFHBQhYz5CTqQXwCeM4G4 zUqhLF9xWk1XInebVRevTVo= =qzQB -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
On Thu, 2006-07-06 at 14:02 -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Benoit wrote: My desktop machine (v2.6.16, same as my MythTV box) is running with 9% free space right now and it is not experiencing any slow down. I think the problem is caused by the usage pattern of MythTV and how it simultaneously streams one or more large files to the HD in relatively small chunks over a long period of time. Ok, if you run into the problem again, can you dump the metadata before freeing the space? The code itself looks sound, and I'm wondering if you've managed to create pathological fragmentation that's mucking things up. Being able to see the fs metadata would confirm or disprove that theory, and help in fixing it. Will do, I've started a bunch of recordings so I should start seeing the problem again by tonight or tomorrow morning. Is there any other data you would like me to collect? Additional oprofile reports, vmstat information before the problem occurs and/or after? Let me know. Thanks. -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Benoit wrote: On Thu, 2006-07-06 at 14:02 -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Benoit wrote: My desktop machine (v2.6.16, same as my MythTV box) is running with 9% free space right now and it is not experiencing any slow down. I think the problem is caused by the usage pattern of MythTV and how it simultaneously streams one or more large files to the HD in relatively small chunks over a long period of time. Ok, if you run into the problem again, can you dump the metadata before freeing the space? The code itself looks sound, and I'm wondering if you've managed to create pathological fragmentation that's mucking things up. Being able to see the fs metadata would confirm or disprove that theory, and help in fixing it. Will do, I've started a bunch of recordings so I should start seeing the problem again by tonight or tomorrow morning. Is there any other data you would like me to collect? Additional oprofile reports, vmstat information before the problem occurs and/or after? Great. Any information you can provide would help quite a bit. oprofile would be useful, as would vmstat information. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFErVjRLPWxlyuTD7IRAlgoAKCRqtHLk6Uq9Bp3yZq/18tHt8l2mwCfT206 UMSE1Om/pvg+svHImWkwLT8= =I5Oj -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
On Thu, 2006-07-06 at 14:19 -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hans Reiser wrote: There should be no possible fragmentation that would increase CPU usage like that. With the current algorithms, in which you check one field in the bitmap to see if it has any free blocks, it should not be possible for scanning bitmaps to take so much time.. There must be a bug in there. I'm sure there is, but it's a bug that others don't seem to be seeing, including myself, and Mike reported he's not experiencing the problem anymore. Given my current workload, unless I can readily reproduce it locally, this takes a low priority for me. Jeff, I'm sure there are at least 5 other people seeing the same or similar symptoms of the problem on the MythTV mailing list: http://www.gossamer-threads.com/lists/mythtv/users/208573?do=post_view_threaded The common factors are high load and ReiserFS. So far I can re-create the problem at will, I just need to record enough programs so my free space falls below 10% for it to happen. However since the box that is experiencing the issue does recordings for other people I have to clear off enough space so the problem goes away between trying to track down the bug. Unfortunately I got a little delete happy this last round and deleted 70gb worth of data. So I've setup 40hrs of recording to be done in 20hrs (two tuners) which should trigger the problem again by tonight or tomorrow morning at the latest. Once that happens I will take a metadata snapshot and email it off to you. I'll be sure to not free up so much space from now on so I can re-create the problem in just a couple hours in case you need to me to collect additional data. If I get a chance I'll attempt to make a script that re-creates the problem too. -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10%?
Jeff, I am suspicious, because I know that 90% is a magic number in your code.
Re: ReiserFS v3 choking when free space falls below 10%?
On Tue, Jul 04, 2006 at 07:37:34PM -0700, Hans Reiser wrote: Mike, thanks so much for going to this much effort. It is rather likely this is a problem affecting many users. Last weekend, i accidentally filled my /. I noticed when i heard the drives (it's a 2 drive raid 0) thrashing. I didn't watch the cpu load, which may've been high, but it seemed to be io bound. -- Tom Vier [EMAIL PROTECTED] DSA Key ID 0x15741ECE
Re: ReiserFS v3 choking when free space falls below 10%?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hans Reiser wrote: Mike Benoit wrote: Hi Jeff, I just tried the patch you suggested and it didn't make a difference. The load still spikes as soon as the free space falls below ~10%. Jeff, please audit your code for what happens when all the bitmap blocks reach 90% full. Could you discuss your design and code in that regard for our benefit? Mike, thanks so much for going to this much effort. It is rather likely this is a problem affecting many users. Mike - Can you post a copy of debugreiserfs -p dev |gzip -c somefile.gz somewhere? I can't reproduce that behavior locally and it would help quite a bit if I had a test case where I could. Thanks. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFErA8jLPWxlyuTD7IRAl03AJ4wPthmJ2/SSIJPux5waXGdaEoDeACfV2gK g12ngw/mzsZUYC3Kj8uuIdE= =1qtg -END PGP SIGNATURE-
Re: ReiserFS v3 choking when free space falls below 10%?
Hi Jeff, I just tried the patch you suggested and it didn't make a difference. The load still spikes as soon as the free space falls below ~10%. On Fri, 2006-06-30 at 12:47 -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hans Reiser wrote: Mike Benoit wrote: This seems strange, because to me this type of workload would lend itself to being less fragmented then most workloads. All the box does is records TV programs, so over the course of 30-60min periods I would guess 95+% of the writes are sequential. Why would the fragmentation be so bad? Is there a way to tell what the fragmentation rate is? Thanks. I wonder how the bitmap optimizations that Jeff added handle this usage pattern. Jeff? That's certainly interesting. The bitmap hinting code should skip bitmap blocks with fewer blocks that are being asked for. The first zero hint patch was never applied to mainline. I have that in my queue as well. Try using the attached patch. It directs the block allocator to start the search at the first known 0 bit rather than scanning the entire block to find it. I'm not sure if will have a meaningful performance impact, but it's worth a try. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFEpVWRLPWxlyuTD7IRAm1uAJwIExdMY1ju2VjnVFmbweEluNUi+QCgqZWL rNWRcVDW0KqBCrvUl1L4veE= =Cuir -END PGP SIGNATURE- -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10%?
Mike Benoit wrote: Hi Jeff, I just tried the patch you suggested and it didn't make a difference. The load still spikes as soon as the free space falls below ~10%. Jeff, please audit your code for what happens when all the bitmap blocks reach 90% full. Could you discuss your design and code in that regard for our benefit? Mike, thanks so much for going to this much effort. It is rather likely this is a problem affecting many users. Hans
Re: ReiserFS v3 choking when free space falls below 10%?
Mike Benoit wrote: This seems strange, because to me this type of workload would lend itself to being less fragmented then most workloads. All the box does is records TV programs, so over the course of 30-60min periods I would guess 95+% of the writes are sequential. Why would the fragmentation be so bad? Is there a way to tell what the fragmentation rate is? Thanks. I wonder how the bitmap optimizations that Jeff added handle this usage pattern. Jeff?
Re: ReiserFS v3 choking when free space falls below 10%?
Jeff, does the code do anything funny when crossing the 90% point? You have special heuristics for that, yes? Maybe a bug is hiding in them?
Re: ReiserFS v3 choking when free space falls below 10%?
Hello On Thu, 2006-06-29 at 10:41 -0700, Mike Benoit wrote: My MythTV box recently started showing odd behavior during recordings, at certain times the load of the box would spike to 10+ and recordings would start losing frames and become unwatchable. TOP would show mythbackend as using 90+% SYS CPU usage, which under normal circumstances it normally uses about 5% USR. So I finally got around to profiling mythbackend when the load starts to spike. To my surprise it appears that once I have less then 10% (30GB) free on the drive reiserfs can't up, even just writing at 1mb/sec is too much for it. Is there something that can be done to fix this, 30gb seems like a lot of wasted space. #opreport CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| -- 77863 78.7856 reiserfs 18183 18.3984 vmlinux 695 0.7032 mysqld 452 0.4574 libc-2.4.so 360 0.3643 libmythtv-0.19.so.0.19.0 324 0.3278 ivtv 323 0.3268 nvidia 242 0.2449 libqt-mt.so.3.3.6 110 0.1113 libpthread-2.4.so 53 0.0536 libstdc++.so.6.0.8 35 0.0354 ld-2.4.so 23 0.0233 libperl.so 22 0.0223 libz.so.1.2.3 snip #opreport -l /usr/src/linux/vmlinux CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %symbol name 9607 52.8351 default_idle 7694 42.3142 find_next_zero_bit It looks like the problem is high fragmentation of free space. find_next_zero_bit is a function which is used to scan bitmaps in order to find blocks for allocation. 183 1.0064 __copy_from_user_ll 570.3135 handle_IRQ_event 370.2035 __copy_to_user_ll 340.1870 ide_outb 300.1650 ide_end_request 220.1210 ioread8 220.1210 schedule 210.1155 get_page_from_freelist 170.0935 mmx_clear_page snip System Details: --- Kernel v2.6.16.21 (custom compiled) - This issue also happened with 2.6.14 too though. FilesystemSize Used Avail Use% Mounted on /dev/hda1 280G 269G 12G 97% / [EMAIL PROTECTED] cat /proc/mounts rootfs / rootfs rw 0 0 /dev /dev tmpfs rw 0 0 /dev/root / reiserfs rw,noatime,nodiratime 0 0 [EMAIL PROTECTED] cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 6 model name : AMD Athlon(tm) XP 2100+ stepping: 2 cpu MHz : 1759.680 cache size : 256 KB [EMAIL PROTECTED] free total used free sharedbuffers cached Mem:515992 496256 19736 0 36256 271728 -/+ buffers/cache: 188272 327720 Swap: 262136408 261728 [EMAIL PROTECTED] ~]# hdparm -i /dev/hda /dev/hda: Model=ST3300622A, FwRev=3.AND, SerialNo=3NF1GAGW Config={ HardSect NotMFM HdSw15uSec Fixed DTR10Mbs RotSpdTol.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=16 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: Unspecified: ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7 * signifies the current active mode [EMAIL PROTECTED] ~]# hdparm -tT /dev/hda /dev/hda: Timing cached reads: 1296 MB in 2.00 seconds = 646.99 MB/sec Timing buffered disk reads: 166 MB in 3.02 seconds = 55.05 MB/sec vmstat 1 output: -- procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 8 0408 5800 29308 24860400 0 1036 406 132 2 98 0 0 4 0408 5644 29396 24860800 0 1128 437 184 2 92 0 6 7 0408 6316 29428 24802000 0 1316 539 287 0 86 0 14 5 0408 6104 29480 24818000 0 588 415 187 0 99 0 1 4 0408 5764 29536 24836400 0 1092 421 172 2 97 1 0 6 0408 6528 29592 24768400 0 1092 425 161 2 98 0 1 2 1408 6372 29676 24772400 0 2304 385 170 2 97 1 0 5 0408 6400 29676 24761600 048 383 122 0 100 0 0 7 0408 6192 29704 24787200 0 1080 409 162 1 98 0 1 6 0408 5720 29732 24830400 0 1076 414 178 1 98 0 1 7 0408 6348 29800 24755200 0 1656 460 300 2 87 1 11 5 0408 6628 29848
Re: ReiserFS v3 choking when free space falls below 10%?
On Thu, 2006-06-29 at 23:12 +0400, Vladimir V. Saveliev wrote: Hello On Thu, 2006-06-29 at 10:41 -0700, Mike Benoit wrote: So I finally got around to profiling mythbackend when the load starts to spike. To my surprise it appears that once I have less then 10% (30GB) free on the drive reiserfs can't up, even just writing at 1mb/sec is too much for it. Is there something that can be done to fix this, 30gb seems like a lot of wasted space. #opreport CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| -- 77863 78.7856 reiserfs 18183 18.3984 vmlinux 695 0.7032 mysqld 452 0.4574 libc-2.4.so 360 0.3643 libmythtv-0.19.so.0.19.0 324 0.3278 ivtv 323 0.3268 nvidia 242 0.2449 libqt-mt.so.3.3.6 110 0.1113 libpthread-2.4.so 53 0.0536 libstdc++.so.6.0.8 35 0.0354 ld-2.4.so 23 0.0233 libperl.so 22 0.0223 libz.so.1.2.3 snip #opreport -l /usr/src/linux/vmlinux CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %symbol name 9607 52.8351 default_idle 7694 42.3142 find_next_zero_bit It looks like the problem is high fragmentation of free space. find_next_zero_bit is a function which is used to scan bitmaps in order to find blocks for allocation. This seems strange, because to me this type of workload would lend itself to being less fragmented then most workloads. All the box does is records TV programs, so over the course of 30-60min periods I would guess 95+% of the writes are sequential. Why would the fragmentation be so bad? Is there a way to tell what the fragmentation rate is? Thanks. -- Mike Benoit [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: ReiserFS v3 choking when free space falls below 10%?
Hello On Thu, 2006-06-29 at 13:15 -0700, Mike Benoit wrote: On Thu, 2006-06-29 at 23:12 +0400, Vladimir V. Saveliev wrote: Hello On Thu, 2006-06-29 at 10:41 -0700, Mike Benoit wrote: So I finally got around to profiling mythbackend when the load starts to spike. To my surprise it appears that once I have less then 10% (30GB) free on the drive reiserfs can't up, even just writing at 1mb/sec is too much for it. Is there something that can be done to fix this, 30gb seems like a lot of wasted space. #opreport CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| -- 77863 78.7856 reiserfs 18183 18.3984 vmlinux 695 0.7032 mysqld 452 0.4574 libc-2.4.so 360 0.3643 libmythtv-0.19.so.0.19.0 324 0.3278 ivtv 323 0.3268 nvidia 242 0.2449 libqt-mt.so.3.3.6 110 0.1113 libpthread-2.4.so 53 0.0536 libstdc++.so.6.0.8 35 0.0354 ld-2.4.so 23 0.0233 libperl.so 22 0.0223 libz.so.1.2.3 snip #opreport -l /usr/src/linux/vmlinux CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %symbol name 9607 52.8351 default_idle 7694 42.3142 find_next_zero_bit It looks like the problem is high fragmentation of free space. find_next_zero_bit is a function which is used to scan bitmaps in order to find blocks for allocation. This seems strange, because to me this type of workload would lend itself to being less fragmented then most workloads. All the box does is records TV programs, so over the course of 30-60min periods I would guess 95+% of the writes are sequential. do you ever remove files? Why would the fragmentation be so bad? Is there a way to tell what the fragmentation rate is? can you please run debugreiserfs -m /dev/hda1 bitmap and send me that file? bitmap should contain dump of free and used blocks. If most of bitmap blocks contain a lot of interleaving free/used sections - free space is highly fragmented and allocating new free blocks can be CPU expensive. Thanks.