Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Boris, On Fri, 1 Feb 2008 08:51:17 +0100, Borislav Petkov wrote: The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. Actually, this behavior has been the case even before the __blk_end_request() changes. I did test plain 2.6.24 with the following --- linux-2.6/drivers/ide/ide-cd.c2008-01-31 22:18:59.0 +0100 +++ linux-2.6/drivers/ide/ide-cd.c-new2008-01-31 22:18:50.0 +0100 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr( /* * If DRQ is clear, the command has completed. */ - if ((stat DRQ_STAT) == 0) + if ((stat DRQ_STAT) == 0) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); goto end_request; + } /* * check which way to transfer data to see whether we've been getting residual byte counts: Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 Jan 31 22:10:06 gollum kernel: [ 26.702945] Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0 Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer , data , len 158 Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158 ... so we've been simply silently ignoring this until now so i guess we don't need to BUG() for something that's totally benign. Hi Kiyoshi, end_that_request_last() is not called when __blk_end_reuqest() returns 1. Then, the issuer isn't waken up. So I think the BUG() or error messages should be there. you mean, end_that_request_last() isn't called when __end_that_request_first() returns an error and this is the case only for fs and pc requests. Otherwise it _is_ called, thus simulating somewhat the previous behavior. However, we never BUG()'ged on residual byte counts before and this driver has been in the kernel tree for ages, so what puzzles me now is how is BUG()'ing here better than before and shouldn't we simply issue a warning instead of killing the interrupt handler... The Jens' patch passes the residual byte counts to __blk_end_request(), so __end_that_reqeust_first() should never return 1 and we should never BUG() on the residual byte counts, unless inconsistency happens such as the size of remaining bios is bigger than the residual byte counts. So if __blk_end_request() returns 1 even with the Jens' patch, it means that the block layer or the driver really have a bug. And then, the request and the bios could leak or the issuer would wait forever because end_that_request_last() isn't called. The previous behavior might ignore such inconsistency and leak only the bios because it was calling end_that_request_last() anyway. I would like to BUG() in such cases personally, but I don't object strongly if you prefer not to BUG(). Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Fri, Feb 01, 2008 at 12:39:27PM -0500, Kiyoshi Ueda wrote: snip end_that_request_last() is not called when __blk_end_reuqest() returns 1. Then, the issuer isn't waken up. So I think the BUG() or error messages should be there. you mean, end_that_request_last() isn't called when __end_that_request_first() returns an error and this is the case only for fs and pc requests. Otherwise it _is_ called, thus simulating somewhat the previous behavior. However, we never BUG()'ged on residual byte counts before and this driver has been in the kernel tree for ages, so what puzzles me now is how is BUG()'ing here better than before and shouldn't we simply issue a warning instead of killing the interrupt handler... The Jens' patch passes the residual byte counts to __blk_end_request(), so __end_that_reqeust_first() should never return 1 and we should never BUG() on the residual byte counts, unless inconsistency happens such as the size of remaining bios is bigger than the residual byte counts. yep. So if __blk_end_request() returns 1 even with the Jens' patch, it means that the block layer or the driver really have a bug. And then, the request and the bios could leak or the issuer would wait forever because end_that_request_last() isn't called. The previous behavior might ignore such inconsistency and leak only the bios because it was calling end_that_request_last() anyway. I would like to BUG() in such cases personally, but I don't object strongly if you prefer not to BUG(). BUG() is definitely what we should do here to catch this case of sizeof(bios) rq-data_len. Putting a brown paper bag over the issue will never get it fixed if it really leaks bios. Thanks for clarifying that. By the way, shouldn't we be doing a little branch prediction here: diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (unlikely(__blk_end_request(rq, 0, rq-data_len))) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Boris, On Fri, 1 Feb 2008 19:29:09 +0100, Borislav Petkov wrote: end_that_request_last() is not called when __blk_end_reuqest() returns 1. Then, the issuer isn't waken up. So I think the BUG() or error messages should be there. you mean, end_that_request_last() isn't called when __end_that_request_first() returns an error and this is the case only for fs and pc requests. Otherwise it _is_ called, thus simulating somewhat the previous behavior. However, we never BUG()'ged on residual byte counts before and this driver has been in the kernel tree for ages, so what puzzles me now is how is BUG()'ing here better than before and shouldn't we simply issue a warning instead of killing the interrupt handler... The Jens' patch passes the residual byte counts to __blk_end_request(), so __end_that_reqeust_first() should never return 1 and we should never BUG() on the residual byte counts, unless inconsistency happens such as the size of remaining bios is bigger than the residual byte counts. yep. So if __blk_end_request() returns 1 even with the Jens' patch, it means that the block layer or the driver really have a bug. And then, the request and the bios could leak or the issuer would wait forever because end_that_request_last() isn't called. The previous behavior might ignore such inconsistency and leak only the bios because it was calling end_that_request_last() anyway. I would like to BUG() in such cases personally, but I don't object strongly if you prefer not to BUG(). BUG() is definitely what we should do here to catch this case of sizeof(bios) rq-data_len. Putting a brown paper bag over the issue will never get it fixed if it really leaks bios. Thanks for clarifying that. By the way, shouldn't we be doing a little branch prediction here: diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (unlikely(__blk_end_request(rq, 0, rq-data_len))) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); I think it's reasonable. Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31 2008, Florian Lohoff wrote: On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); Fixes the crash on boot for me ... Great, thanks for confirming that. I'll make sure the patch goes upstream today, if Linus is available. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); Fixes the crash on boot for me ... Flo -- Florian Lohoff [EMAIL PROTECTED] +49-171-2280134 Those who would give up a little freedom to get a little security shall soon have neither - Benjamin Franklin signature.asc Description: Digital signature
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Jens, On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) + if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); OK, I undarstand the leftover is legal. By the way, is it safe to always return success if there is a leftover? I thought we might have to complete the rq with -EIO in such case. Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On 31/01/2008, at 18.04, Kiyoshi Ueda [EMAIL PROTECTED] wrote: Hi Jens, On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr (ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); -if (__blk_end_request(rq, 0, 0)) +if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); OK, I undarstand the leftover is legal. By the way, is it safe to always return success if there is a leftover? I thought we might have to complete the rq with -EIO in such case. data_len being non zero should pass the residual count back to the issuer. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Jens, On Thu, 31 Jan 2008 19:16:54 +0100, Jens Axboe wrote: On 31/01/2008, at 18.04, Kiyoshi Ueda [EMAIL PROTECTED] wrote: On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74c6087..bee05a3 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr (ide_drive_t *drive) */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); -if (__blk_end_request(rq, 0, 0)) +if (__blk_end_request(rq, 0, rq-data_len)) BUG(); HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); OK, I undarstand the leftover is legal. By the way, is it safe to always return success if there is a leftover? I thought we might have to complete the rq with -EIO in such case. data_len being non zero should pass the residual count back to the issuer. Aah, so the issuer can know how many bytes of the I/Os are not done, and the error status of the bio which is completed by end_that_request_first() in __blk_end_request() don't matter for the issuer. OK, thanks. I think the patch is fine. Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. Actually, this behavior has been the case even before the __blk_end_request() changes. I did test plain 2.6.24 with the following --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.0 +0100 +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.0 +0100 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr( /* * If DRQ is clear, the command has completed. */ - if ((stat DRQ_STAT) == 0) + if ((stat DRQ_STAT) == 0) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); goto end_request; + } /* * check which way to transfer data to see whether we've been getting residual byte counts: Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 Jan 31 22:10:06 gollum kernel: [ 26.702945] Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0 Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer , data , len 158 Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158 ... so we've been simply silently ignoring this until now so i guess we don't need to BUG() for something that's totally benign. -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Boris, Thank you for the confirmation of original behavior. On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote: On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. Actually, this behavior has been the case even before the __blk_end_request() changes. I did test plain 2.6.24 with the following --- linux-2.6/drivers/ide/ide-cd.c2008-01-31 22:18:59.0 +0100 +++ linux-2.6/drivers/ide/ide-cd.c-new2008-01-31 22:18:50.0 +0100 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr( /* * If DRQ is clear, the command has completed. */ - if ((stat DRQ_STAT) == 0) + if ((stat DRQ_STAT) == 0) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); goto end_request; + } /* * check which way to transfer data to see whether we've been getting residual byte counts: Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 Jan 31 22:10:06 gollum kernel: [ 26.702945] Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0 Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer , data , len 158 Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158 ... so we've been simply silently ignoring this until now so i guess we don't need to BUG() for something that's totally benign. end_that_request_last() is not called when __blk_end_reuqest() returns 1. Then, the issuer isn't waken up. So I think the BUG() or error messages should be there. And fortunately, the issuer seems not to mind whether end_that_request_first() is called for the remaining bio or not. So I think Jens' patch is fine. Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
On Thu, Jan 31, 2008 at 05:35:56PM -0500, Kiyoshi Ueda wrote: Hi Boris, Thank you for the confirmation of original behavior. On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote: On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote: On Thu, Jan 31 2008, Nai Xia wrote: My dmesg relevant info is quite similar: [6.875041] Freeing unused kernel memory: 320k freed [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.144439] [8.144439] sector 10824201199534213, nr/cnr 0/0 [8.144439] bio cf029280, biotail cf029280, buffer , data , len 158 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.144439] backup: data_len=158 bi_size=158 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 [8.160756] [8.160756] sector 2669858, nr/cnr 0/0 [8.160756] bio cf029300, biotail cf029300, buffer , data , len 158 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.160756] backup: data_len=158 bi_size=158 [ 14.851101] eth0: link up [ 27.121883] eth0: no IPv6 routers present And by the way, Kiyoshi, This can be reproduced in a typical setup vmware workstation 6.02 with a vritual IDE cdrom, in case you wanna catch that with your own eyes. :-) Thanks for your trying hard to correct this annoying bug. The below fix should be enough. It's perfectly legal to have leftover byte counts when the drive signals completion, happens all the time for eg user issued commands where you don't know an exact byte count. Actually, this behavior has been the case even before the __blk_end_request() changes. I did test plain 2.6.24 with the following --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.0 +0100 +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.0 +0100 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr( /* * If DRQ is clear, the command has completed. */ - if ((stat DRQ_STAT) == 0) + if ((stat DRQ_STAT) == 0) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); goto end_request; + } /* * check which way to transfer data to see whether we've been getting residual byte counts: Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8 Jan 31 22:10:06 gollum kernel: [ 26.702945] Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0 Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer , data , len 158 Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158 ... so we've been simply silently ignoring this until now so i guess we don't need to BUG() for something that's totally benign. Hi Kiyoshi, end_that_request_last() is not called when __blk_end_reuqest() returns 1. Then, the issuer isn't waken up. So I think the BUG() or error messages should be there. you mean, end_that_request_last() isn't called when __end_that_request_first() returns an error and this is the case only for fs and pc requests. Otherwise it _is_ called, thus simulating somewhat the previous behavior. However, we never BUG()'ged on residual byte counts before and this driver has been in the kernel tree for ages, so what puzzles me now is how is BUG()'ing here better than before and shouldn't we simply issue a warning instead of killing the interrupt handler... ..or am i missing something? -- Regards/Gruß, Boris. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Hi Roland, Borislav, Bart, Added linux-ide ML, since we may be able to get helps from other ide experts. This thread started from: http://lkml.org/lkml/2008/1/29/140 On Tue, 29 Jan 2008 18:23:56 -0500 (EST), Kiyoshi Ueda wrote: Hi Bart, On Tue, 29 Jan 2008 14:22:53 -0800, Roland Dreier wrote: Hi, I saw the same BUG from ide-cd on one of my systems. I applied the debugging patch to replace the BUG with blk_dump_rq_flags(), and I got the output below (full boot log and .config attached to this email). Please let me know if there's anything else that would help debug the problem. Thank you for the information, Roland. [4.072271] Uniform CD-ROM driver Revision: 3.20 [4.098236] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [4.100269] [4.100269] sector 1949759, nr/cnr 0/0 [4.100269] bio 8102418cc600, biotail 8102418cc600, buffer , d8 [4.100269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [4.101005] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [4.104269] [4.104269] sector 1949759, nr/cnr 0/0 [4.104269] bio 8102418cc600, biotail 8102418cc600, buffer , d2 [4.104269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [4.109203] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [4.112270] [4.112270] sector 1949759, nr/cnr 0/0 [4.112270] bio 8102418cc600, biotail 8102418cc600, buffer , d8 [4.112270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [4.112945] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [4.116270] [4.116270] sector 1949759, nr/cnr 0/0 [4.116270] bio 8102418cc600, biotail 8102418cc600, buffer , d2 [4.116270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Bart, This means that the rq still has a bio even after DRQ_STAT is cleared. The original ide-cd code was calling only end_that_request_last() there. So I thought that the rq should have no bio when DRQ_STAT is cleared, otherwise the bio leaks. Was my understanding wrong and is that correct behavior in ide-cd? I borrowed a box having the same nForce chipset and tried sg_inq command to submit the GPCMD_INQUIRY (cdb: 12 of the debug message). I confirmed that ide-cd run through the code path (DRQ_STAT == 0) by the same debug patch, but requests always don't have bio there on my test box. So I can't reproduce the problem yet. --- ide-cd: rq: dev hda: type=2, flags=114c8 sector 37958141, nr/cnr 0/0 bio , biotail f78e4980, buffer , data , len 0 cdb: 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00 --- The original code was calling only end_that_request_last() here, but no problem happened. This may mean that the upper layer can handle the rq correctly, no matter whether the rq still has a bio or not. If so, we should be able to unlink the bio by calling end_that_request_chunk() with remaining data size. Roland, Could you try the patch below and give me all boot messages again? This patch displays debug messages against requests still having bio, then tries to unlink all bios from the rq before the rq is completed. So your system may be able to continue to work correctly after displaying debug messages. I'd like to see the debug messages and know whether your system still gets the problem or not. --- a/drivers/ide/ide-cd.c 2008-01-30 18:24:51.0 -0500 +++ b/drivers/ide/ide-cd.c 2008-01-30 18:24:33.0 -0500 @@ -1722,8 +1722,18 @@ static ide_startstop_t cdrom_newpc_intr( */ if ((stat DRQ_STAT) == 0) { spin_lock_irqsave(ide_lock, flags); - if (__blk_end_request(rq, 0, 0)) - BUG(); + if (__blk_end_request(rq, 0, 0)) { + blk_dump_rq_flags(rq, ide-cd: rq still having bio); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); + + if (__blk_end_request(rq, 0, rq-data_len)) { + blk_dump_rq_flags(rq, ide-cd: BAD rq); + printk(backup: data_len=%u bi_size=%u\n, + rq-data_len, rq-bio-bi_size); + BUG(); + } + } HWGROUP(drive)-rq = NULL; spin_unlock_irqrestore(ide_lock, flags); Thanks, Kiyoshi Ueda - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089
Could you try the patch below and give me all boot messages again? Sure, no problem, see below for full log (I updated to the latest git, which seems to have some other unrelated problems with things timing out earlier in the boot, but it does get to the ide-cd init); here's the relevant looking debug info: [ 163.990058] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [ 163.990060] [ 163.990061] sector 1987647, nr/cnr 0/0 [ 163.990063] bio 8101c418c200, biotail 8101c418c200, buffer , data , len 158 [ 163.990064] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [ 163.990069] backup: data_len=158 bi_size=158 [ 163.994021] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [ 163.994023] [ 163.994024] sector 1987647, nr/cnr 0/0 [ 163.994025] bio 8101c418c200, biotail 8101c418c200, buffer , data , len 158 [ 163.994027] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [ 163.994032] backup: data_len=158 bi_size=158 This patch displays debug messages against requests still having bio, then tries to unlink all bios from the rq before the rq is completed. So your system may be able to continue to work correctly after displaying debug messages. By the way, even with your first debug patch, I had no problem using the box once it booted. I didn't try the CD-ROM but everything else seemed fine. Also, I saw the same issue on a different box (Dell SC1435 with serverworks IDE): [1.367648] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 [1.371649] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx [2.612417] SvrWks HT1000: IDE controller (0x1166:0x0214 rev 0x00) at PCI slot :00:02.1 [2.612439] SvrWks HT1000: not 100% native mode: will probe irqs later [2.616427] ide0: BM-DMA at 0x08c0-0x08c7, BIOS settings: hda:DMA, hdb:pio [2.623505] Probing IDE interface ide0... [4.037070] hda: PHILIPS DVD-ROM SDR089, ATAPI CD/DVD-ROM drive [4.041801] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4 [4.058447] hda: UDMA/33 mode selected [4.061278] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 [4.632739] hda: ATAPI 24X DVD-ROM drive, 2048kB Cache ... [8.631615] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [8.631615] [8.631615] sector 2867855, nr/cnr 0/0 [8.631615] bio 81021fba3f40, biotail 81021fba3f40, buffer , data , len 158 [8.631615] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 [8.660505] ide-cd: rq still having bio: dev hda: type=2, flags=114c8 [8.660505] [8.660505] sector 2867855, nr/cnr 0/0 [8.660505] bio 81021fba3f40, biotail 81021fba3f40, buffer , data , len 62 [8.660505] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 Let me know if you'd like debug info from that too. [0.00] Linux version 2.6.24 ([EMAIL PROTECTED]) (gcc version 4.2.3 20080114 (prerelease) (Debian 4.2.2-7)) #5 SMP Wed Jan 30 18:25:58 PST 2008 [0.00] Command line: root=/dev/sda1 ro console=ttyS0,38400n8 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e8000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - 9fff (usable) [0.00] BIOS-e820: 9fff - 9fffe000 (ACPI data) [0.00] BIOS-e820: 9fffe000 - a000 (ACPI NVS) [0.00] BIOS-e820: fec0 - fec01000 (reserved) [0.00] BIOS-e820: fee0 - fee01000 (reserved) [0.00] BIOS-e820: ff70 - 0001 (reserved) [0.00] BIOS-e820: 0001 - 00024500 (usable) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 3200 used [0.00] Entering add_active_range(0, 256, 655344) 1 entries of 3200 used [0.00] Entering add_active_range(0, 1048576, 2379776) 2 entries of 3200 used [0.00] end_pfn_map = 2379776 [0.00] DMI 2.3 present. [0.00] ACPI: RSDP 000FA190, 0024 (r2 ACPIAM) [0.00] ACPI: XSDT 9FFF0100, 0044 (r1 A M I OEMXSDT 12000505 MSFT 97) [0.00] ACPI: FACP 9FFF0290, 00F4 (r3 A M I OEMFACP 12000505 MSFT 97) [0.00] ACPI Warning (tbfadt-0442): Optional field Gpe1Block has zero address or length: 44A0/0 [20070126] [0.00] ACPI: DSDT 9FFF0440, 5544 (r1 1HQC8 1HQC80033 INTL 2002026) [0.00] ACPI: FACS 9FFFE000, 0040 [0.00] ACPI: APIC 9FFF0390, 00A4 (r1 A M I OEMAPIC 12000505 MSFT 97) [