Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-02-01 Thread Kiyoshi Ueda
Hi Boris,

On Fri, 1 Feb 2008 08:51:17 +0100, Borislav Petkov wrote:
The below fix should be enough. It's perfectly legal to have leftover
byte counts when the drive signals completion, happens all the time for
eg user issued commands where you don't know an exact byte count.
   
   Actually, this behavior has been the case even before
   the __blk_end_request() changes.
   I did test plain 2.6.24 with the following
   
   
   --- linux-2.6/drivers/ide/ide-cd.c2008-01-31 22:18:59.0 
   +0100
   +++ linux-2.6/drivers/ide/ide-cd.c-new2008-01-31 22:18:50.0 
   +0100
   @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
 /*
  * If DRQ is clear, the command has completed.
  */
   - if ((stat  DRQ_STAT) == 0)
   + if ((stat  DRQ_STAT) == 0) {
   + blk_dump_rq_flags(rq, ide-cd: rq still having bio);
   + printk(backup: data_len=%u  bi_size=%u\n,
   + rq-data_len, rq-bio-bi_size);
 goto end_request;
   + }

 /*
  * check which way to transfer data
   
   
   to see whether we've been getting residual byte counts:
   
   Jan 31 22:10:06 gollum kernel: [   26.702877] ide-cd: rq still having 
   bio: dev hdc: type=2, flags=114c8
   Jan 31 22:10:06 gollum kernel: [   26.702945]
   Jan 31 22:10:06 gollum kernel: [   26.702946] sector 2673511, nr/cnr 0/0
   Jan 31 22:10:06 gollum kernel: [   26.703052] bio dfa8ec40, biotail 
   dfa8ec40, buffer , data , len 158
   Jan 31 22:10:06 gollum kernel: [   26.703122] cdb: 12 00 00 00 fe 00 00 
   00 00 00 00 00 00 00 00 00
   Jan 31 22:10:06 gollum kernel: [   26.703877] backup: data_len=158  
   bi_size=158
   
   ... so we've been simply silently ignoring this until now so
   i guess we don't need to BUG() for something that's totally benign.
 
 Hi Kiyoshi,
  
  end_that_request_last() is not called when __blk_end_reuqest()
  returns 1.  Then, the issuer isn't waken up.
  So I think the BUG() or error messages should be there.
 
 you mean, end_that_request_last() isn't called when __end_that_request_first()
 returns an error and this is the case only for fs and pc requests.
 Otherwise it _is_ called, thus simulating somewhat the previous behavior.
 However, we never BUG()'ged on residual byte counts before and
 this driver has been in the kernel tree for ages, so what puzzles
 me now is how is BUG()'ing here better than before and shouldn't we
 simply issue a warning instead of killing the interrupt handler...

The Jens' patch passes the residual byte counts to __blk_end_request(),
so __end_that_reqeust_first() should never return 1 and we should never
BUG() on the residual byte counts, unless inconsistency happens such as
the size of remaining bios is bigger than the residual byte counts.

So if __blk_end_request() returns 1 even with the Jens' patch,
it means that the block layer or the driver really have a bug.
And then, the request and the bios could leak or the issuer
would wait forever because end_that_request_last() isn't called.

The previous behavior might ignore such inconsistency and leak only
the bios because it was calling end_that_request_last() anyway.
I would like to BUG() in such cases personally, but I don't object
strongly if you prefer not to BUG().

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-02-01 Thread Borislav Petkov
On Fri, Feb 01, 2008 at 12:39:27PM -0500, Kiyoshi Ueda wrote:

snip

   end_that_request_last() is not called when __blk_end_reuqest()
   returns 1.  Then, the issuer isn't waken up.
   So I think the BUG() or error messages should be there.
  
  you mean, end_that_request_last() isn't called when 
  __end_that_request_first()
  returns an error and this is the case only for fs and pc requests.
  Otherwise it _is_ called, thus simulating somewhat the previous behavior.
  However, we never BUG()'ged on residual byte counts before and
  this driver has been in the kernel tree for ages, so what puzzles
  me now is how is BUG()'ing here better than before and shouldn't we
  simply issue a warning instead of killing the interrupt handler...
 
 The Jens' patch passes the residual byte counts to __blk_end_request(),
 so __end_that_reqeust_first() should never return 1 and we should never
 BUG() on the residual byte counts, unless inconsistency happens such as
 the size of remaining bios is bigger than the residual byte counts.

yep.

 So if __blk_end_request() returns 1 even with the Jens' patch,
 it means that the block layer or the driver really have a bug.
 And then, the request and the bios could leak or the issuer
 would wait forever because end_that_request_last() isn't called.
 
 The previous behavior might ignore such inconsistency and leak only
 the bios because it was calling end_that_request_last() anyway.
 I would like to BUG() in such cases personally, but I don't object
 strongly if you prefer not to BUG().

BUG() is definitely what we should do here to catch this case of sizeof(bios) 
rq-data_len. Putting a brown paper bag over the issue will never get it fixed
if it really leaks bios. Thanks for clarifying that.

By the way, shouldn't we be doing a little branch prediction here:

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74c6087..bee05a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
*drive)
 */
if ((stat  DRQ_STAT) == 0) {
spin_lock_irqsave(ide_lock, flags);
-   if (__blk_end_request(rq, 0, 0))
+   if (unlikely(__blk_end_request(rq, 0, rq-data_len)))
BUG();
HWGROUP(drive)-rq = NULL;
spin_unlock_irqrestore(ide_lock, flags);


-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-02-01 Thread Kiyoshi Ueda
Hi Boris,

On Fri, 1 Feb 2008 19:29:09 +0100, Borislav Petkov wrote:
end_that_request_last() is not called when __blk_end_reuqest()
returns 1.  Then, the issuer isn't waken up.
So I think the BUG() or error messages should be there.
   
   you mean, end_that_request_last() isn't called when 
   __end_that_request_first()
   returns an error and this is the case only for fs and pc requests.
   Otherwise it _is_ called, thus simulating somewhat the previous behavior.
   However, we never BUG()'ged on residual byte counts before and
   this driver has been in the kernel tree for ages, so what puzzles
   me now is how is BUG()'ing here better than before and shouldn't we
   simply issue a warning instead of killing the interrupt handler...
  
  The Jens' patch passes the residual byte counts to __blk_end_request(),
  so __end_that_reqeust_first() should never return 1 and we should never
  BUG() on the residual byte counts, unless inconsistency happens such as
  the size of remaining bios is bigger than the residual byte counts.
 
 yep.
 
  So if __blk_end_request() returns 1 even with the Jens' patch,
  it means that the block layer or the driver really have a bug.
  And then, the request and the bios could leak or the issuer
  would wait forever because end_that_request_last() isn't called.
  
  The previous behavior might ignore such inconsistency and leak only
  the bios because it was calling end_that_request_last() anyway.
  I would like to BUG() in such cases personally, but I don't object
  strongly if you prefer not to BUG().
 
 BUG() is definitely what we should do here to catch this case of sizeof(bios) 
 
 rq-data_len. Putting a brown paper bag over the issue will never get it fixed
 if it really leaks bios. Thanks for clarifying that.
 
 By the way, shouldn't we be doing a little branch prediction here:
 
 diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
 index 74c6087..bee05a3 100644
 --- a/drivers/ide/ide-cd.c
 +++ b/drivers/ide/ide-cd.c
 @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
 *drive)
  */
 if ((stat  DRQ_STAT) == 0) {
 spin_lock_irqsave(ide_lock, flags);
 -   if (__blk_end_request(rq, 0, 0))
 +   if (unlikely(__blk_end_request(rq, 0, rq-data_len)))
 BUG();
 HWGROUP(drive)-rq = NULL;
 spin_unlock_irqrestore(ide_lock, flags);

I think it's reasonable.

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Jens Axboe
On Thu, Jan 31 2008, Nai Xia wrote:
 My dmesg relevant info is quite similar:
 
 [6.875041] Freeing unused kernel memory: 320k freed
 [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
 [8.144439]
 [8.144439] sector 10824201199534213, nr/cnr 0/0
 [8.144439] bio cf029280, biotail cf029280, buffer , data
 , len 158
 [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
 [8.144439] backup: data_len=158  bi_size=158
 [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
 [8.160756]
 [8.160756] sector 2669858, nr/cnr 0/0
 [8.160756] bio cf029300, biotail cf029300, buffer , data
 , len 158
 [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
 [8.160756] backup: data_len=158  bi_size=158
 [   14.851101] eth0: link up
 [   27.121883] eth0: no IPv6 routers present
 
 
 And by the way, Kiyoshi,
 This can be reproduced in a typical setup vmware workstation 6.02 with
 a vritual IDE cdrom,
 in case you wanna catch that with your own eyes. :-)
 Thanks for your trying hard to correct this annoying bug.

The below fix should be enough. It's perfectly legal to have leftover
byte counts when the drive signals completion, happens all the time for
eg user issued commands where you don't know an exact byte count.

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74c6087..bee05a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
*drive)
 */
if ((stat  DRQ_STAT) == 0) {
spin_lock_irqsave(ide_lock, flags);
-   if (__blk_end_request(rq, 0, 0))
+   if (__blk_end_request(rq, 0, rq-data_len))
BUG();
HWGROUP(drive)-rq = NULL;
spin_unlock_irqrestore(ide_lock, flags);

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Jens Axboe
On Thu, Jan 31 2008, Florian Lohoff wrote:
 On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
  The below fix should be enough. It's perfectly legal to have leftover
  byte counts when the drive signals completion, happens all the time for
  eg user issued commands where you don't know an exact byte count.
  
  diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
  index 74c6087..bee05a3 100644
  --- a/drivers/ide/ide-cd.c
  +++ b/drivers/ide/ide-cd.c
  @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
  *drive)
   */
  if ((stat  DRQ_STAT) == 0) {
  spin_lock_irqsave(ide_lock, flags);
  -   if (__blk_end_request(rq, 0, 0))
  +   if (__blk_end_request(rq, 0, rq-data_len))
  BUG();
  HWGROUP(drive)-rq = NULL;
  spin_unlock_irqrestore(ide_lock, flags);
  
 
 Fixes the crash on boot for me ...

Great, thanks for confirming that. I'll make sure the patch goes
upstream today, if Linus is available.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Florian Lohoff
On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
 The below fix should be enough. It's perfectly legal to have leftover
 byte counts when the drive signals completion, happens all the time for
 eg user issued commands where you don't know an exact byte count.
 
 diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
 index 74c6087..bee05a3 100644
 --- a/drivers/ide/ide-cd.c
 +++ b/drivers/ide/ide-cd.c
 @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
 *drive)
*/
   if ((stat  DRQ_STAT) == 0) {
   spin_lock_irqsave(ide_lock, flags);
 - if (__blk_end_request(rq, 0, 0))
 + if (__blk_end_request(rq, 0, rq-data_len))
   BUG();
   HWGROUP(drive)-rq = NULL;
   spin_unlock_irqrestore(ide_lock, flags);
 

Fixes the crash on boot for me ...

Flo
-- 
Florian Lohoff  [EMAIL PROTECTED] +49-171-2280134
Those who would give up a little freedom to get a little 
  security shall soon have neither - Benjamin Franklin


signature.asc
Description: Digital signature


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Kiyoshi Ueda
Hi Jens,

On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote:
 On Thu, Jan 31 2008, Nai Xia wrote:
  My dmesg relevant info is quite similar:
  
  [6.875041] Freeing unused kernel memory: 320k freed
  [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
  [8.144439]
  [8.144439] sector 10824201199534213, nr/cnr 0/0
  [8.144439] bio cf029280, biotail cf029280, buffer , data
  , len 158
  [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [8.144439] backup: data_len=158  bi_size=158
  [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
  [8.160756]
  [8.160756] sector 2669858, nr/cnr 0/0
  [8.160756] bio cf029300, biotail cf029300, buffer , data
  , len 158
  [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [8.160756] backup: data_len=158  bi_size=158
  [   14.851101] eth0: link up
  [   27.121883] eth0: no IPv6 routers present
  
  
  And by the way, Kiyoshi,
  This can be reproduced in a typical setup vmware workstation 6.02 with
  a vritual IDE cdrom,
  in case you wanna catch that with your own eyes. :-)
  Thanks for your trying hard to correct this annoying bug.
 
 The below fix should be enough. It's perfectly legal to have leftover
 byte counts when the drive signals completion, happens all the time for
 eg user issued commands where you don't know an exact byte count.
 
 diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
 index 74c6087..bee05a3 100644
 --- a/drivers/ide/ide-cd.c
 +++ b/drivers/ide/ide-cd.c
 @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
 *drive)
*/
   if ((stat  DRQ_STAT) == 0) {
   spin_lock_irqsave(ide_lock, flags);
 - if (__blk_end_request(rq, 0, 0))
 + if (__blk_end_request(rq, 0, rq-data_len))
   BUG();
   HWGROUP(drive)-rq = NULL;
   spin_unlock_irqrestore(ide_lock, flags);

OK, I undarstand the leftover is legal.

By the way, is it safe to always return success if there is a leftover?
I thought we might have to complete the rq with -EIO in such case.

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Jens Axboe



On 31/01/2008, at 18.04, Kiyoshi Ueda [EMAIL PROTECTED] wrote:


Hi Jens,

On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote:

On Thu, Jan 31 2008, Nai Xia wrote:

My dmesg relevant info is quite similar:

[6.875041] Freeing unused kernel memory: 320k freed
[8.143120] ide-cd: rq still having bio: dev hdc: type=2,  
flags=114c8

[8.144439]
[8.144439] sector 10824201199534213, nr/cnr 0/0
[8.144439] bio cf029280, biotail cf029280, buffer , data
, len 158
[8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[8.144439] backup: data_len=158  bi_size=158
[8.160756] ide-cd: rq still having bio: dev hdc: type=2,  
flags=114c8

[8.160756]
[8.160756] sector 2669858, nr/cnr 0/0
[8.160756] bio cf029300, biotail cf029300, buffer , data
, len 158
[8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[8.160756] backup: data_len=158  bi_size=158
[   14.851101] eth0: link up
[   27.121883] eth0: no IPv6 routers present


And by the way, Kiyoshi,
This can be reproduced in a typical setup vmware workstation 6.02  
with

a vritual IDE cdrom,
in case you wanna catch that with your own eyes. :-)
Thanks for your trying hard to correct this annoying bug.


The below fix should be enough. It's perfectly legal to have leftover
byte counts when the drive signals completion, happens all the time  
for

eg user issued commands where you don't know an exact byte count.

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74c6087..bee05a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr 
(ide_drive_t *drive)

*/
   if ((stat  DRQ_STAT) == 0) {
   spin_lock_irqsave(ide_lock, flags);
-if (__blk_end_request(rq, 0, 0))
+if (__blk_end_request(rq, 0, rq-data_len))
   BUG();
   HWGROUP(drive)-rq = NULL;
   spin_unlock_irqrestore(ide_lock, flags);


OK, I undarstand the leftover is legal.

By the way, is it safe to always return success if there is a  
leftover?

I thought we might have to complete the rq with -EIO in such case.


data_len being non zero should pass the residual count back to the  
issuer. 
-

To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Kiyoshi Ueda
Hi Jens,

On Thu, 31 Jan 2008 19:16:54 +0100, Jens Axboe wrote:
 On 31/01/2008, at 18.04, Kiyoshi Ueda [EMAIL PROTECTED] wrote:
  On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote:
  On Thu, Jan 31 2008, Nai Xia wrote:
  My dmesg relevant info is quite similar:
 
  [6.875041] Freeing unused kernel memory: 320k freed
  [8.143120] ide-cd: rq still having bio: dev hdc: type=2,  
  flags=114c8
  [8.144439]
  [8.144439] sector 10824201199534213, nr/cnr 0/0
  [8.144439] bio cf029280, biotail cf029280, buffer , data
  , len 158
  [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [8.144439] backup: data_len=158  bi_size=158
  [8.160756] ide-cd: rq still having bio: dev hdc: type=2,  
  flags=114c8
  [8.160756]
  [8.160756] sector 2669858, nr/cnr 0/0
  [8.160756] bio cf029300, biotail cf029300, buffer , data
  , len 158
  [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [8.160756] backup: data_len=158  bi_size=158
  [   14.851101] eth0: link up
  [   27.121883] eth0: no IPv6 routers present
 
 
  And by the way, Kiyoshi,
  This can be reproduced in a typical setup vmware workstation 6.02  
  with
  a vritual IDE cdrom,
  in case you wanna catch that with your own eyes. :-)
  Thanks for your trying hard to correct this annoying bug.
 
  The below fix should be enough. It's perfectly legal to have leftover
  byte counts when the drive signals completion, happens all the time  
  for eg user issued commands where you don't know an exact byte count.
 
  diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
  index 74c6087..bee05a3 100644
  --- a/drivers/ide/ide-cd.c
  +++ b/drivers/ide/ide-cd.c
  @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr 
  (ide_drive_t *drive)
  */
 if ((stat  DRQ_STAT) == 0) {
 spin_lock_irqsave(ide_lock, flags);
  -if (__blk_end_request(rq, 0, 0))
  +if (__blk_end_request(rq, 0, rq-data_len))
 BUG();
 HWGROUP(drive)-rq = NULL;
 spin_unlock_irqrestore(ide_lock, flags);
 
  OK, I undarstand the leftover is legal.
 
  By the way, is it safe to always return success if there is a  
  leftover?
  I thought we might have to complete the rq with -EIO in such case.
 
 data_len being non zero should pass the residual count back to the  
 issuer. 

Aah, so the issuer can know how many bytes of the I/Os are not done,
and the error status of the bio which is completed by
end_that_request_first() in __blk_end_request() don't matter
for the issuer.
OK, thanks.  I think the patch is fine.

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Borislav Petkov
On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
 On Thu, Jan 31 2008, Nai Xia wrote:
  My dmesg relevant info is quite similar:
  
  [6.875041] Freeing unused kernel memory: 320k freed
  [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
  [8.144439]
  [8.144439] sector 10824201199534213, nr/cnr 0/0
  [8.144439] bio cf029280, biotail cf029280, buffer , data
  , len 158
  [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [8.144439] backup: data_len=158  bi_size=158
  [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
  [8.160756]
  [8.160756] sector 2669858, nr/cnr 0/0
  [8.160756] bio cf029300, biotail cf029300, buffer , data
  , len 158
  [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [8.160756] backup: data_len=158  bi_size=158
  [   14.851101] eth0: link up
  [   27.121883] eth0: no IPv6 routers present
  
  
  And by the way, Kiyoshi,
  This can be reproduced in a typical setup vmware workstation 6.02 with
  a vritual IDE cdrom,
  in case you wanna catch that with your own eyes. :-)
  Thanks for your trying hard to correct this annoying bug.
 
 The below fix should be enough. It's perfectly legal to have leftover
 byte counts when the drive signals completion, happens all the time for
 eg user issued commands where you don't know an exact byte count.

Actually, this behavior has been the case even before the __blk_end_request()
changes. I did test plain 2.6.24 with the following


--- linux-2.6/drivers/ide/ide-cd.c  2008-01-31 22:18:59.0 +0100
+++ linux-2.6/drivers/ide/ide-cd.c-new  2008-01-31 22:18:50.0 +0100
@@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
/*
 * If DRQ is clear, the command has completed.
 */
-   if ((stat  DRQ_STAT) == 0)
+   if ((stat  DRQ_STAT) == 0) {
+   blk_dump_rq_flags(rq, ide-cd: rq still having bio);
+   printk(backup: data_len=%u  bi_size=%u\n,
+   rq-data_len, rq-bio-bi_size);
goto end_request;
+   }
 
/*
 * check which way to transfer data


to see whether we've been getting residual byte counts:

Jan 31 22:10:06 gollum kernel: [   26.702877] ide-cd: rq still having bio: dev 
hdc: type=2, flags=114c8
Jan 31 22:10:06 gollum kernel: [   26.702945]
Jan 31 22:10:06 gollum kernel: [   26.702946] sector 2673511, nr/cnr 0/0
Jan 31 22:10:06 gollum kernel: [   26.703052] bio dfa8ec40, biotail dfa8ec40, 
buffer , data , len 158
Jan 31 22:10:06 gollum kernel: [   26.703122] cdb: 12 00 00 00 fe 00 00 00 00 
00 00 00 00 00 00 00
Jan 31 22:10:06 gollum kernel: [   26.703877] backup: data_len=158  bi_size=158

... so we've been simply silently ignoring this until now so i guess we don't
need to BUG() for something that's totally benign.

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Kiyoshi Ueda
Hi Boris,

Thank you for the confirmation of original behavior.

On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote:
 On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
  On Thu, Jan 31 2008, Nai Xia wrote:
   My dmesg relevant info is quite similar:
   
   [6.875041] Freeing unused kernel memory: 320k freed
   [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
   [8.144439]
   [8.144439] sector 10824201199534213, nr/cnr 0/0
   [8.144439] bio cf029280, biotail cf029280, buffer , data
   , len 158
   [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
   [8.144439] backup: data_len=158  bi_size=158
   [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
   [8.160756]
   [8.160756] sector 2669858, nr/cnr 0/0
   [8.160756] bio cf029300, biotail cf029300, buffer , data
   , len 158
   [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
   [8.160756] backup: data_len=158  bi_size=158
   [   14.851101] eth0: link up
   [   27.121883] eth0: no IPv6 routers present
   
   
   And by the way, Kiyoshi,
   This can be reproduced in a typical setup vmware workstation 6.02 with
   a vritual IDE cdrom,
   in case you wanna catch that with your own eyes. :-)
   Thanks for your trying hard to correct this annoying bug.
  
  The below fix should be enough. It's perfectly legal to have leftover
  byte counts when the drive signals completion, happens all the time for
  eg user issued commands where you don't know an exact byte count.
 
 Actually, this behavior has been the case even before the __blk_end_request()
 changes. I did test plain 2.6.24 with the following
 
 
 --- linux-2.6/drivers/ide/ide-cd.c2008-01-31 22:18:59.0 +0100
 +++ linux-2.6/drivers/ide/ide-cd.c-new2008-01-31 22:18:50.0 
 +0100
 @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
   /*
* If DRQ is clear, the command has completed.
*/
 - if ((stat  DRQ_STAT) == 0)
 + if ((stat  DRQ_STAT) == 0) {
 + blk_dump_rq_flags(rq, ide-cd: rq still having bio);
 + printk(backup: data_len=%u  bi_size=%u\n,
 + rq-data_len, rq-bio-bi_size);
   goto end_request;
 + }
  
   /*
* check which way to transfer data
 
 
 to see whether we've been getting residual byte counts:
 
 Jan 31 22:10:06 gollum kernel: [   26.702877] ide-cd: rq still having bio: 
 dev hdc: type=2, flags=114c8
 Jan 31 22:10:06 gollum kernel: [   26.702945]
 Jan 31 22:10:06 gollum kernel: [   26.702946] sector 2673511, nr/cnr 0/0
 Jan 31 22:10:06 gollum kernel: [   26.703052] bio dfa8ec40, biotail dfa8ec40, 
 buffer , data , len 158
 Jan 31 22:10:06 gollum kernel: [   26.703122] cdb: 12 00 00 00 fe 00 00 00 00 
 00 00 00 00 00 00 00
 Jan 31 22:10:06 gollum kernel: [   26.703877] backup: data_len=158  
 bi_size=158
 
 ... so we've been simply silently ignoring this until now so i guess we don't
 need to BUG() for something that's totally benign.

end_that_request_last() is not called when __blk_end_reuqest()
returns 1.  Then, the issuer isn't waken up.
So I think the BUG() or error messages should be there.

And fortunately, the issuer seems not to mind whether
end_that_request_first() is called for the remaining bio or not.

So I think Jens' patch is fine.

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-31 Thread Borislav Petkov
On Thu, Jan 31, 2008 at 05:35:56PM -0500, Kiyoshi Ueda wrote:
 Hi Boris,
 
 Thank you for the confirmation of original behavior.
 
 On Thu, 31 Jan 2008 22:37:40 +0100, Borislav Petkov wrote:
  On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
   On Thu, Jan 31 2008, Nai Xia wrote:
My dmesg relevant info is quite similar:

[6.875041] Freeing unused kernel memory: 320k freed
[8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
[8.144439]
[8.144439] sector 10824201199534213, nr/cnr 0/0
[8.144439] bio cf029280, biotail cf029280, buffer , data
, len 158
[8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[8.144439] backup: data_len=158  bi_size=158
[8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
[8.160756]
[8.160756] sector 2669858, nr/cnr 0/0
[8.160756] bio cf029300, biotail cf029300, buffer , data
, len 158
[8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[8.160756] backup: data_len=158  bi_size=158
[   14.851101] eth0: link up
[   27.121883] eth0: no IPv6 routers present


And by the way, Kiyoshi,
This can be reproduced in a typical setup vmware workstation 6.02 with
a vritual IDE cdrom,
in case you wanna catch that with your own eyes. :-)
Thanks for your trying hard to correct this annoying bug.
   
   The below fix should be enough. It's perfectly legal to have leftover
   byte counts when the drive signals completion, happens all the time for
   eg user issued commands where you don't know an exact byte count.
  
  Actually, this behavior has been the case even before the 
  __blk_end_request()
  changes. I did test plain 2.6.24 with the following
  
  
  --- linux-2.6/drivers/ide/ide-cd.c  2008-01-31 22:18:59.0 +0100
  +++ linux-2.6/drivers/ide/ide-cd.c-new  2008-01-31 22:18:50.0 
  +0100
  @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
  /*
   * If DRQ is clear, the command has completed.
   */
  -   if ((stat  DRQ_STAT) == 0)
  +   if ((stat  DRQ_STAT) == 0) {
  +   blk_dump_rq_flags(rq, ide-cd: rq still having bio);
  +   printk(backup: data_len=%u  bi_size=%u\n,
  +   rq-data_len, rq-bio-bi_size);
  goto end_request;
  +   }
   
  /*
   * check which way to transfer data
  
  
  to see whether we've been getting residual byte counts:
  
  Jan 31 22:10:06 gollum kernel: [   26.702877] ide-cd: rq still having bio: 
  dev hdc: type=2, flags=114c8
  Jan 31 22:10:06 gollum kernel: [   26.702945]
  Jan 31 22:10:06 gollum kernel: [   26.702946] sector 2673511, nr/cnr 0/0
  Jan 31 22:10:06 gollum kernel: [   26.703052] bio dfa8ec40, biotail 
  dfa8ec40, buffer , data , len 158
  Jan 31 22:10:06 gollum kernel: [   26.703122] cdb: 12 00 00 00 fe 00 00 00 
  00 00 00 00 00 00 00 00
  Jan 31 22:10:06 gollum kernel: [   26.703877] backup: data_len=158  
  bi_size=158
  
  ... so we've been simply silently ignoring this until now so i guess we 
  don't
  need to BUG() for something that's totally benign.

Hi Kiyoshi,
 
 end_that_request_last() is not called when __blk_end_reuqest()
 returns 1.  Then, the issuer isn't waken up.
 So I think the BUG() or error messages should be there.

you mean, end_that_request_last() isn't called when __end_that_request_first()
returns an error and this is the case only for fs and pc requests. Otherwise it
_is_ called, thus simulating somewhat the previous behavior. However, we never 
BUG()'ged
on residual byte counts before and this driver has been in the kernel tree for
ages, so what puzzles me now is how is BUG()'ing here better than before and
shouldn't we simply issue a warning instead of killing the interrupt handler...

..or am i missing something?

-- 
Regards/Gruß,
Boris.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-30 Thread Kiyoshi Ueda
Hi Roland, Borislav, Bart,

Added linux-ide ML, since we may be able to get helps from other
ide experts.  This thread started from:
http://lkml.org/lkml/2008/1/29/140

On Tue, 29 Jan 2008 18:23:56 -0500 (EST), Kiyoshi Ueda wrote:
 Hi Bart, 
 
 On Tue, 29 Jan 2008 14:22:53 -0800, Roland Dreier wrote:
  Hi, I saw the same BUG from ide-cd on one of my systems.  I applied
  the debugging patch to replace the BUG with blk_dump_rq_flags(), and I
  got the output below (full boot log and .config attached to this
  email).
  
  Please let me know if there's anything else that would help debug the
  problem.
 
 Thank you for the information, Roland.
 
  
  [4.072271] Uniform CD-ROM driver Revision: 3.20
  [4.098236] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
  [4.100269]
  [4.100269] sector 1949759, nr/cnr 0/0
  [4.100269] bio 8102418cc600, biotail 8102418cc600, buffer 
  , d8
  [4.100269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [4.101005] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
  [4.104269]
  [4.104269] sector 1949759, nr/cnr 0/0
  [4.104269] bio 8102418cc600, biotail 8102418cc600, buffer 
  , d2
  [4.104269] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [4.109203] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
  [4.112270]
  [4.112270] sector 1949759, nr/cnr 0/0
  [4.112270] bio 8102418cc600, biotail 8102418cc600, buffer 
  , d8
  [4.112270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
  [4.112945] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
  [4.116270]
  [4.116270] sector 1949759, nr/cnr 0/0
  [4.116270] bio 8102418cc600, biotail 8102418cc600, buffer 
  , d2
  [4.116270] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
 
 Bart,
 This means that the rq still has a bio even after DRQ_STAT is cleared.
 The original ide-cd code was calling only end_that_request_last() there.
 So I thought that the rq should have no bio when DRQ_STAT is cleared,
 otherwise the bio leaks.
 
 Was my understanding wrong and is that correct behavior in ide-cd?

I borrowed a box having the same nForce chipset and tried sg_inq
command to submit the GPCMD_INQUIRY (cdb: 12 of the debug message).
I confirmed that ide-cd run through the code path (DRQ_STAT == 0)
by the same debug patch, but requests always don't have bio there
on my test box.  So I can't reproduce the problem yet.
---
ide-cd: rq: dev hda: type=2, flags=114c8

sector 37958141, nr/cnr 0/0
bio , biotail f78e4980, buffer , data , len 0
cdb: 12 00 00 00 24 00 00 00 00 00 00 00 00 00 00 00
---


The original code was calling only end_that_request_last() here,
but no problem happened.
This may mean that the upper layer can handle the rq correctly,
no matter whether the rq still has a bio or not.
If so, we should be able to unlink the bio by calling
end_that_request_chunk() with remaining data size.



Roland,
Could you try the patch below and give me all boot messages again?

This patch displays debug messages against requests still having bio,
then tries to unlink all bios from the rq before the rq is completed.
So your system may be able to continue to work correctly
after displaying debug messages.
I'd like to see the debug messages and know whether your system
still gets the problem or not.

--- a/drivers/ide/ide-cd.c  2008-01-30 18:24:51.0 -0500
+++ b/drivers/ide/ide-cd.c  2008-01-30 18:24:33.0 -0500
@@ -1722,8 +1722,18 @@ static ide_startstop_t cdrom_newpc_intr(
 */
if ((stat  DRQ_STAT) == 0) {
spin_lock_irqsave(ide_lock, flags);
-   if (__blk_end_request(rq, 0, 0))
-   BUG();
+   if (__blk_end_request(rq, 0, 0)) {
+   blk_dump_rq_flags(rq, ide-cd: rq still having bio);
+   printk(backup: data_len=%u  bi_size=%u\n,
+   rq-data_len, rq-bio-bi_size);
+
+   if (__blk_end_request(rq, 0, rq-data_len)) {
+   blk_dump_rq_flags(rq, ide-cd: BAD rq);
+   printk(backup: data_len=%u  bi_size=%u\n,
+   rq-data_len, rq-bio-bi_size);
+   BUG();
+   }
+   }
HWGROUP(drive)-rq = NULL;
spin_unlock_irqrestore(ide_lock, flags);
 
Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 -g8561b089

2008-01-30 Thread Roland Dreier
  Could you try the patch below and give me all boot messages again?

Sure, no problem, see below for full log (I updated to the latest git,
which seems to have some other unrelated problems with things timing
out earlier in the boot, but it does get to the ide-cd init); here's
the relevant looking debug info:

[  163.990058] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
[  163.990060] 
[  163.990061] sector 1987647, nr/cnr 0/0
[  163.990063] bio 8101c418c200, biotail 8101c418c200, buffer 
, data , len 158
[  163.990064] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 
[  163.990069] backup: data_len=158  bi_size=158
[  163.994021] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
[  163.994023] 
[  163.994024] sector 1987647, nr/cnr 0/0
[  163.994025] bio 8101c418c200, biotail 8101c418c200, buffer 
, data , len 158
[  163.994027] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 
[  163.994032] backup: data_len=158  bi_size=158

  This patch displays debug messages against requests still having bio,
  then tries to unlink all bios from the rq before the rq is completed.
  So your system may be able to continue to work correctly
  after displaying debug messages.

By the way, even with your first debug patch, I had no problem using
the box once it booted.  I didn't try the CD-ROM but everything else
seemed fine.

Also, I saw the same issue on a different box (Dell SC1435 with
serverworks IDE):

[1.367648] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[1.371649] ide: Assuming 33MHz system bus speed for PIO modes; override 
with idebus=xx
[2.612417] SvrWks HT1000: IDE controller (0x1166:0x0214 rev 0x00) at  
PCI slot :00:02.1
[2.612439] SvrWks HT1000: not 100% native mode: will probe irqs later
[2.616427] ide0: BM-DMA at 0x08c0-0x08c7, BIOS settings: hda:DMA, 
hdb:pio
[2.623505] Probing IDE interface ide0...
[4.037070] hda: PHILIPS DVD-ROM SDR089, ATAPI CD/DVD-ROM drive
[4.041801] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
[4.058447] hda: UDMA/33 mode selected
[4.061278] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[4.632739] hda: ATAPI 24X DVD-ROM drive, 2048kB Cache
...
[8.631615] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
[8.631615] 
[8.631615] sector 2867855, nr/cnr 0/0
[8.631615] bio 81021fba3f40, biotail 81021fba3f40, buffer 
, data , len 158
[8.631615] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 
[8.660505] ide-cd: rq still having bio: dev hda: type=2, flags=114c8
[8.660505] 
[8.660505] sector 2867855, nr/cnr 0/0
[8.660505] bio 81021fba3f40, biotail 81021fba3f40, buffer 
, data , len 62
[8.660505] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00 

Let me know if you'd like debug info from that too.

[0.00] Linux version 2.6.24 ([EMAIL PROTECTED]) (gcc version 4.2.3 
20080114 (prerelease) (Debian 4.2.2-7)) #5 SMP Wed Jan 30 18:25:58 PST 2008
[0.00] Command line: root=/dev/sda1 ro console=ttyS0,38400n8 
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e8000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 9fff (usable)
[0.00]  BIOS-e820: 9fff - 9fffe000 (ACPI data)
[0.00]  BIOS-e820: 9fffe000 - a000 (ACPI NVS)
[0.00]  BIOS-e820: fec0 - fec01000 (reserved)
[0.00]  BIOS-e820: fee0 - fee01000 (reserved)
[0.00]  BIOS-e820: ff70 - 0001 (reserved)
[0.00]  BIOS-e820: 0001 - 00024500 (usable)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 3200 used
[0.00] Entering add_active_range(0, 256, 655344) 1 entries of 3200 used
[0.00] Entering add_active_range(0, 1048576, 2379776) 2 entries of 3200 
used
[0.00] end_pfn_map = 2379776
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000FA190, 0024 (r2 ACPIAM)
[0.00] ACPI: XSDT 9FFF0100, 0044 (r1 A M I  OEMXSDT  12000505 MSFT  
 97)
[0.00] ACPI: FACP 9FFF0290, 00F4 (r3 A M I  OEMFACP  12000505 MSFT  
 97)
[0.00] ACPI Warning (tbfadt-0442): Optional field Gpe1Block has zero 
address or length: 44A0/0 [20070126]
[0.00] ACPI: DSDT 9FFF0440, 5544 (r1  1HQC8 1HQC80033 INTL  
2002026)
[0.00] ACPI: FACS 9FFFE000, 0040
[0.00] ACPI: APIC 9FFF0390, 00A4 (r1 A M I  OEMAPIC  12000505 MSFT  
 97)
[