lpfc build failure in latest git

2008-02-02 Thread maximilian attems
  CC [M]  drivers/scsi/lpfc/lpfc_init.o
  drivers/scsi/lpfc/lpfc_init.c: In function ‘lpfc_pci_probe_one’:
  drivers/scsi/lpfc/lpfc_init.c:1897: error: implicit declaration of function 
‘pci_enable_device_bars’
  make[6]: *** [drivers/scsi/lpfc/lpfc_init.o] Error 1

seen on both x86 archs.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-02 Thread Pete Wyckoff
[EMAIL PROTECTED] wrote on Wed, 30 Jan 2008 10:34 -0600:
 On Wed, 2008-01-30 at 09:38 +0100, Bart Van Assche wrote:
  On Jan 30, 2008 12:32 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
  
   iSER has parameters to limit the maximum size of RDMA (it needs to
   repeat RDMA with a poor configuration)?
  
  Please specify which parameters you are referring to. As you know I
  had already repeated my tests with ridiculously high values for the
  following iSER parameters: FirstBurstLength, MaxBurstLength and
  MaxRecvDataSegmentLength (16 MB, which is more than the 1 MB block
  size specified to dd).
 
 the 1Mb block size is a bit of a red herring.  Unless you've
 specifically increased the max_sector_size and are using an sg_chain
 converted driver, on x86 the maximum possible transfer accumulation is
 0.5MB.
 
 I certainly don't rule out that increasing the transfer size up from
 0.5MB might be the way to improve STGT efficiency, since at an 1GB/s
 theoretical peak, that's roughly 2000 context switches per I/O; however,
 It doesn't look like you've done anything that will overcome the block
 layer limitations.

The MRDSL parameter has no effect on iSER, as the RFC describes.
How to transfer data to satisfy a command is solely up to the
target.  So you would need both big requests from the client, then
look at how the target will send the data.

I've only used 512 kB for the RDMA transfer size from the target, as
it matches the default client size and was enough to get good
performance out of my IB gear and minimizes resource consumption on
the target.  It's currently hard-coded as a #define.  There is no
provision in the protocol for the client to dictate the value.

If others want to spend some effort trying to tune stgt for iSER,
there are a fair number of comments in the code, including a big one
that explains this RDMA transfer size issue.  And I'll answer
informed questions as I can.  But I'm not particularly interested in
arguing about which implementation is best, or trying to interpret
bandwidth comparison numbers from poorly designed tests.  It takes
work to understand these issues.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Jeff Garzik

Ingo Molnar wrote:

===
--- linux.orig/drivers/scsi/lpfc/lpfc_init.c
+++ linux/drivers/scsi/lpfc/lpfc_init.c
@@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
uint16_t iotag;
int bars = pci_select_bars(pdev, IORESOURCE_MEM);
 
-	if (pci_enable_device_bars(pdev, bars))

+   if (pci_enable_device_io(pdev))
goto out;


Look at the line right above it...  AFAICS you want 
pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs.


Also a CC to linux-scsi and the driver author would be nice, as they are 
the ones with hardware and can verify.


This set of changes seemed like 50% guesswork to me, without consulting 
the authors :(  And unlike many changes, you actually have to know the 
hardware [or get clues from surrounding code] to make the change.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lpfc build failure in latest git

2008-02-02 Thread James Bottomley

On Sat, 2008-02-02 at 15:36 +0100, maximilian attems wrote:
 CC [M]  drivers/scsi/lpfc/lpfc_init.o
   drivers/scsi/lpfc/lpfc_init.c: In function ‘lpfc_pci_probe_one’:
   drivers/scsi/lpfc/lpfc_init.c:1897: error: implicit declaration of function 
 ‘pci_enable_device_bars’
   make[6]: *** [drivers/scsi/lpfc/lpfc_init.o] Error 1
 
 seen on both x86 archs.

Oh, Greg was supposed to add this if he merged last.  I'll send it
through the SCSI tree.

James

Subject: git-scsi-misc vs gregkh-pci-pci-remove-users-of-pci_enable_device_bars
From: Andrew Morton [EMAIL PROTECTED]

Whoever merges last needs this.

Cc: James Bottomley [EMAIL PROTECTED]
Cc: Greg KH [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 drivers/scsi/lpfc/lpfc_init.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN 
drivers/scsi/lpfc/lpfc_init.c~git-scsi-misc-vs-gregkh-pci-pci-remove-users-of-pci_enable_device_bars
 drivers/scsi/lpfc/lpfc_init.c
--- 
a/drivers/scsi/lpfc/lpfc_init.c~git-scsi-misc-vs-gregkh-pci-pci-remove-users-of-pci_enable_device_bars
+++ a/drivers/scsi/lpfc/lpfc_init.c
@@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
uint16_t iotag;
int bars = pci_select_bars(pdev, IORESOURCE_MEM);
 
-   if (pci_enable_device_bars(pdev, bars))
+   if (pci_enable_device_mem(pdev))
goto out;
if (pci_request_selected_regions(pdev, bars, LPFC_DRIVER_NAME))
goto out_disable_device;
_



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread James Bottomley

On Sat, 2008-02-02 at 10:51 -0500, Jeff Garzik wrote:
 Ingo Molnar wrote:
  ===
  --- linux.orig/drivers/scsi/lpfc/lpfc_init.c
  +++ linux/drivers/scsi/lpfc/lpfc_init.c
  @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
  uint16_t iotag;
  int bars = pci_select_bars(pdev, IORESOURCE_MEM);
   
  -   if (pci_enable_device_bars(pdev, bars))
  +   if (pci_enable_device_io(pdev))
  goto out;
 
 Look at the line right above it...  AFAICS you want 
 pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs.
 
 Also a CC to linux-scsi and the driver author would be nice, as they are 
 the ones with hardware and can verify.
 
 This set of changes seemed like 50% guesswork to me, without consulting 
 the authors :(  And unlike many changes, you actually have to know the 
 hardware [or get clues from surrounding code] to make the change.

Jeff is right, this fix is wrong.

The correct fix is here:

http://marc.info/?l=linux-scsim=120196768605031

Ingo, could you please cc linux-scsi on your mails ... it would have
been a complete disaster to have this incorrect patch go in.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Ingo Molnar

* Jeff Garzik [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 ===
 --- linux.orig/drivers/scsi/lpfc/lpfc_init.c
 +++ linux/drivers/scsi/lpfc/lpfc_init.c
 @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
  uint16_t iotag;
  int bars = pci_select_bars(pdev, IORESOURCE_MEM);
  -   if (pci_enable_device_bars(pdev, bars))
 +if (pci_enable_device_io(pdev))
  goto out;

 Look at the line right above it...  AFAICS you want 
 pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs.

 Also a CC to linux-scsi and the driver author would be nice, as they 
 are the ones with hardware and can verify.

it would have been totally appropriate for me to just send a mail to 
lkml with the proper subject line about the breakage. (I might even have 
decided to stay completely silent about the issue and fix it for my own 
build, letting you guys figure it out.)

Instead i did a search of lkml (based on the function name in the build 
error) and figured out where the pull request was on lkml: Greg. I 
replied to that mail, he'll obviously know whom else to Cc from that 
point on (if anyone). I really didnt want to (nor did i need to) figure 
out whether this was some general driver level API change that happen 
kernel-wide, or some SCSI specific change. I simply replied to the pull 
request whose Cc: line seemed well-populated to me already. I also took 
a look at the commit itself and did a quick hack in a hurry to keep the 
tests rolling. It really did not occur to me that i should have added 
anyone else to the Cc: line, as [EMAIL PROTECTED] was 
Cc:-ed already so i assumed the interest was from that angle.

( And as this was spent from my family's weekend time and i had no time
  and no interest to dig any further than to figure out the first hop
  of the change that broke the build, and the parties who initiated that
  hop. I'm in fact surprised that your and James's answer to my
  bugreport is hostility. )

So i find your suggestion that i should have added more people to the 
Cc: line unfair on several levels.

 This set of changes seemed like 50% guesswork to me, without 
 consulting the authors :( And unlike many changes, you actually have 
 to know the hardware [or get clues from surrounding code] to make the 
 change.

you mean the whole set of changes? Or just mine? Mine was a 30 seconds 
guesswork of course, i dont have the hardware, i didnt do the change, 
nor did i do anything near that code - i just saw the build failure stop 
my testing and sent in this notification and a hack. That's why i sent 
it to Greg, as a reply to the mail where he pushed these changes 
upstream and who'll know what to do with it from that point on. My patch 
can be totally thrown away (and should be, apparently).

but ... i guess next time i'll think twice before sending any bugreports 
about or related to the SCSI code anywhere, unless they become really 
annoying. Who needs this hassle?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Jeff Garzik

Ingo Molnar wrote:
it would have been totally appropriate for me to just send a mail to 
lkml with the proper subject line about the breakage. (I might even have 
decided to stay completely silent about the issue and fix it for my own 
build, letting you guys figure it out.)


Oh come on...   You are smart enough to know to at least CC the driver 
maintainer, the key POC who should be aware of breakage of their driver. 
 That is a standard courtesy.




( And as this was spent from my family's weekend time and i had no time
  and no interest to dig any further than to figure out the first hop
  of the change that broke the build, and the parties who initiated that
  hop. I'm in fact surprised that your and James's answer to my
  bugreport is hostility. )


I'm sorry you read would be nice as hostility.


So i find your suggestion that i should have added more people to the 
Cc: line unfair on several levels.


As I noted, it is an obvious courtesy to CC the driver maintainer, at 
the very least.


_Especially_ when it is a change that requires some knowledge of the 
hardware, as was this case.



This set of changes seemed like 50% guesswork to me, without 
consulting the authors :( And unlike many changes, you actually have 
to know the hardware [or get clues from surrounding code] to make the 
change.


you mean the whole set of changes?


The whole set of changes, yes, not just yours.


but ... i guess next time i'll think twice before sending any bugreports 
about or related to the SCSI code anywhere, unless they become really 
annoying. Who needs this hassle?


The fact is, each larger subsystem (net, scsi, ata I know) has several 
vendor contacts and driver maintainers who for various reasons prefer a 
more focused -- and often less hostile -- mailing list to LKML.


I have a hard enough time as it is, trying to convince hardware vendors 
to work with us, that we are not all assholes.


How about respecting the preferences of certain segments of a very large 
community, even when they differ from your own?  We have a 
community-accepted method of expressing these preferences, the 
MAINTAINERS file.


IMO, standard practice should be:

* To or CC: driver maintainer mentioned in MAINTAINERS (if any)
* CC: LKML, any list mentioned in MAINTAINERS

So, how about CC'ing the targets that have nicely requested to be CC'd?

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Ingo Molnar

* Jeff Garzik [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 it would have been totally appropriate for me to just send a mail to lkml 
 with the proper subject line about the breakage. (I might even have 
 decided to stay completely silent about the issue and fix it for my own 
 build, letting you guys figure it out.)

 Oh come on...  You are smart enough to know to at least CC the driver 
 maintainer, the key POC who should be aware of breakage of their 
 driver.  That is a standard courtesy.

is there any particular reason why you cut out the most relevant part of 
my reply, which happens to answer all your questions AFAICS:

 Instead i did a search of lkml (based on the function name in the 
 build error) and figured out where the pull request was on lkml: 
 Greg. I replied to that mail, he'll obviously know whom else to Cc 
 from that point on (if anyone). I really didnt want to (nor did i 
 need to) figure out whether this was some general driver level API 
 change that happen kernel-wide, or some SCSI specific change. I 
 simply replied to the pull request whose Cc: line seemed 
 well-populated to me already. I also took a look at the commit itself 
 and did a quick hack in a hurry to keep the tests rolling. It really 
 did not occur to me that i should have added anyone else to the Cc: 
 line, as [EMAIL PROTECTED] was Cc:-ed already so i 
 assumed the interest was from that angle.

had you read this portion you'd have realized that i did not search for 
any owner of the file, i simply searched for the person who introduced 
the change, and the on-lkml mail where the change was introduced.

and that's all that should be needed, really. Believe me, i hit tons of 
bugs all across the kernel, often several bugs a day, and it's hard even 
for me to figure out who maintains a file and when. (and in Linux 
there's no ownership of files anyway) So as a general rule i go after 
changes instead, and that's exactly what i did here too. I do 
delta/regression QA - i.e. i watch for _changes_ that break the kernel 
and hence the general 'owner' of a file is often irrelevant - it's the 
maintainer who introduces a change who matters, and we do lots of 
cross-maintain merges. Only if i do not manage to identify a change do i 
try to figure out who maintains a file at that given moment. (But those 
mails often go into black holes, they get bounced, subscriber-required 
email lists, etc. etc.) It's also nontrivial to map the files to the 
MAINTAINERS file, and it's also quite outdated in some portions. So the 
MAINTAINERS file is the last resort i use.

so i'm still totally befuddled why you think that there was anything 
particularly wrong or unhelpful about me replying to the specific pull 
request that introduced a particular breakage into the kernel. Had i 
mailed to lkml with a terse kernel build broke message with just an 
URL to a config and the build breakage, you could rightfully have 
complained that i should have done more to properly direct my bugreport. 
But this breakage was about a PCI API change, the pull request had a PCI 
mailing list Cc:-ed, why should i have thought that this needs the 
attention of any other parties?

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread James Bottomley
On Sat, 2008-02-02 at 18:08 +0100, Ingo Molnar wrote:
 * Jeff Garzik [EMAIL PROTECTED] wrote:
 
  Ingo Molnar wrote:
  ===
  --- linux.orig/drivers/scsi/lpfc/lpfc_init.c
  +++ linux/drivers/scsi/lpfc/lpfc_init.c
  @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
 uint16_t iotag;
 int bars = pci_select_bars(pdev, IORESOURCE_MEM);
   - if (pci_enable_device_bars(pdev, bars))
  +  if (pci_enable_device_io(pdev))
 goto out;
 
  Look at the line right above it...  AFAICS you want 
  pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs.
 
  Also a CC to linux-scsi and the driver author would be nice, as they 
  are the ones with hardware and can verify.
 
 it would have been totally appropriate for me to just send a mail to 
 lkml with the proper subject line about the breakage. (I might even have 
 decided to stay completely silent about the issue and fix it for my own 
 build, letting you guys figure it out.)

We agreed to differ a while ago on your head in the sand, post it to
LKML because everybody follows that list attitude.  Some nice soul will
always (eventually) forward to the correct list, so this practice more
or less works (just not in as timely fashion as actually posting to the
relevant list).

 Instead i did a search of lkml (based on the function name in the build 
 error) and figured out where the pull request was on lkml: Greg. I 
 replied to that mail, he'll obviously know whom else to Cc from that 
 point on (if anyone). I really didnt want to (nor did i need to) figure 
 out whether this was some general driver level API change that happen 
 kernel-wide, or some SCSI specific change. I simply replied to the pull 
 request whose Cc: line seemed well-populated to me already. I also took 
 a look at the commit itself and did a quick hack in a hurry to keep the 
 tests rolling. It really did not occur to me that i should have added 
 anyone else to the Cc: line, as [EMAIL PROTECTED] was 
 Cc:-ed already so i assumed the interest was from that angle.
 
 ( And as this was spent from my family's weekend time and i had no time
   and no interest to dig any further than to figure out the first hop
   of the change that broke the build, and the parties who initiated that
   hop. I'm in fact surprised that your and James's answer to my
   bugreport is hostility. )

What's hostile about telling you your patch is wrong and pointing you at
the correct one?

 So i find your suggestion that i should have added more people to the 
 Cc: line unfair on several levels.
 
  This set of changes seemed like 50% guesswork to me, without 
  consulting the authors :( And unlike many changes, you actually have 
  to know the hardware [or get clues from surrounding code] to make the 
  change.
 
 you mean the whole set of changes? Or just mine? Mine was a 30 seconds 
 guesswork of course, i dont have the hardware, i didnt do the change, 
 nor did i do anything near that code - i just saw the build failure stop 
 my testing and sent in this notification and a hack. That's why i sent 
 it to Greg, as a reply to the mail where he pushed these changes 
 upstream and who'll know what to do with it from that point on. My patch 
 can be totally thrown away (and should be, apparently).
 
 but ... i guess next time i'll think twice before sending any bugreports 
 about or related to the SCSI code anywhere, unless they become really 
 annoying. Who needs this hassle?

Oh good grief, don't come the raw prawn with us!

You're a kernel maintainer, you are expected to know the rules and
follow them.  The very fact that a well known maintainer posts a patch
with a signed-off-by to Andrew and Linus means that it likely gets
applied.  Because of this, maintainers and other people with similarly
trusted positions are expected to go via the lists and subsystems in
part as general courtesy, but also to verify that their patch is
actually valid before it gets applied.

Are you seriously telling us that it required too much investigation on
your part to figure out that something with a compile failure in
drivers/scsi might belong on the scsi list?

Let me tell you exactly what would have happened if that patch had been
applied:  Because the wrong bars were enabled, the device would have
attached but been non-functional.  Likely Emulex would have picked this
up in their -rc1 testing and spent several days trying to trace the
cause in their labs.  Chances are, that, because this type of breakage
is extremely subtle, they wouldn't have noticed the fact that the enable
should have been _mem not _io.  Then they would have raised the issue to
linux-scsi.  It would probably take at least a person week of effort
before anyone finally noticed what the issue was.

If you can't see that your behaviour needs modifying, I suggest you
seriously consider your position.  For all our talk of a nice utopian
democracy of meritocracy in Open 

Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Jeff Garzik

Ingo Molnar wrote:

* Jeff Garzik [EMAIL PROTECTED] wrote:


Ingo Molnar wrote:
it would have been totally appropriate for me to just send a mail to lkml 
with the proper subject line about the breakage. (I might even have 
decided to stay completely silent about the issue and fix it for my own 
build, letting you guys figure it out.)
Oh come on...  You are smart enough to know to at least CC the driver 
maintainer, the key POC who should be aware of breakage of their 
driver.  That is a standard courtesy.


is there any particular reason why you cut out the most relevant part of 
my reply, which happens to answer all your questions AFAICS:


Instead i did a search of lkml (based on the function name in the 
build error) and figured out where the pull request was on lkml: 
Greg. I replied to that mail, he'll obviously know whom else to Cc 
from that point on (if anyone). I really didnt want to (nor did i 
need to) figure out whether this was some general driver level API 
change that happen kernel-wide, or some SCSI specific change. I 
simply replied to the pull request whose Cc: line seemed 
well-populated to me already. I also took a look at the commit itself 
and did a quick hack in a hurry to keep the tests rolling. It really 
did not occur to me that i should have added anyone else to the Cc: 
line, as [EMAIL PROTECTED] was Cc:-ed already so i 
assumed the interest was from that angle.


had you read this portion you'd have realized that i did not search for 
any owner of the file, i simply searched for the person who introduced 
the change, and the on-lkml mail where the change was introduced.


And therein lies the problem.  The original submittor omitted relevant 
maintainers, you followed that [incorrect] example, and the end result 
was clear:  an obviously wrong change.


Thus, the problem is precisely what you stated:  you did not bother to 
search for people who care about that file.



and that's all that should be needed, really. Believe me, i hit tons of 


Hardly.  What part of this change requires knowledge of the hardware 
is difficult to understand?


And if you do not have that knowledge, why is it so trying to CC people 
who actively maintain a driver, and have that knowledge?


It's simple common sense to -ask- or at least -notify- in such cases.

That the original submittor made the same mistake is no reason to repeat 
the mistake.



bugs all across the kernel, often several bugs a day, and it's hard even 
for me to figure out who maintains a file and when. (and in Linux 
there's no ownership of files anyway) So as a general rule i go after 
changes instead, and that's exactly what i did here too. I do 
delta/regression QA - i.e. i watch for _changes_ that break the kernel 
and hence the general 'owner' of a file is often irrelevant - it's the 
maintainer who introduces a change who matters, and we do lots of 
cross-maintain merges. Only if i do not manage to identify a change do i 
try to figure out who maintains a file at that given moment. (But those 
mails often go into black holes, they get bounced, subscriber-required 
email lists, etc. etc.) It's also nontrivial to map the files to the 
MAINTAINERS file, and it's also quite outdated in some portions. So the 
MAINTAINERS file is the last resort i use.


That's a long list of excuses in an attempt to ignore the facts:

Fact 1:  The driver you modified is actively maintained

Fact 2:  The driver maintainer has respectfully indicated, through the 
standard community mechanism, the useful points of contact.


Fact 3:  The MAINTAINERS entry is correct and up-to-date.

Fact 4:  Even if you wanted to ignore MAINTAINERS, 'git log' on the 
relevant file could have told you who are useful parties to CC.


It's just common courtesy to CC a driver maintainer, ESPECIALLY when a 
change requires knowledge of the driver/hardware in question.



so i'm still totally befuddled why you think that there was anything 
particularly wrong or unhelpful about me replying to the specific pull 
request that introduced a particular breakage into the kernel. Had i 
mailed to lkml with a terse kernel build broke message with just an 
URL to a config and the build breakage, you could rightfully have 
complained that i should have done more to properly direct my bugreport. 
But this breakage was about a PCI API change, the pull request had a PCI 
mailing list Cc:-ed, why should i have thought that this needs the 
attention of any other parties?


Because the change required knowledge not only of PCI, but of the 
hardware in question.  As your patch demonstrated.


And yes -- the original changes should have been CC'd to interested 
parties as well.  I'm still waiting to hear back from Alan or Bart 
whether the ATA/IDE changes in that PCI pile actually work...  the 
original changeset even noted that relevant parties had not yet been 
queried.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message 

Re: [PATCH] [SCSI] sd: make error handling more robust (v2)

2008-02-02 Thread Greg KH
On Fri, Feb 01, 2008 at 07:43:54PM -0500, Mike Snitzer wrote:
 Hi Greg,
 
 Given your recent call for review of the next 2.6.22-stable I'd
 appreciate it if you and the rest of the stable team would strongly
 consider this SCSI IO error propagation fix for inclusion in 2.6.22.17
 (as well as the other stable trees).

Is the patch in Linus's tree already?

If so, what git commit id?  That's a requirement to go into the stable
tree...

If not, we need to wait until it is.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread Jeff Garzik

Ingo Molnar wrote:
so please tell me Jeff. If Greg, who is the super-maintainer of your 
code area, and who deals with your code every day and changes it every 
minute and hour, simply did not Cc: the SCSI list - how am i, a largely 
outside party in this matter, supposed to notice that 3 maintainers and 
3 mailing lists in the Cc: were somehow not enough and that i was 
supposed to grow the already sizable Cc: list even more?


Because, regardless of the situation, it's both common courtesy and wise 
practice to CC relevant driver maintainers, when you touch a driver.


And it's just common sense:  Greg simply does not know the intimate 
details of every PCI driver.  Nor do I.  Nor you.


In the case of lpfc here, we have an active driver maintainer, and an 
up-to-date MAINTAINERS entry.  Even if you are too slack to read 
MAINTAINERS, 'git log' would have given you the same info.


Don't pretend there is some benefit here to ignoring the people that 
best know the driver.  I don't buy that; it simply makes no engineering 
sense whatsoever.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] sd: make error handling more robust (v2)

2008-02-02 Thread Greg KH
On Sat, Feb 02, 2008 at 03:56:30PM -0500, Mike Snitzer wrote:
 On Feb 2, 2008 3:24 PM, Greg KH [EMAIL PROTECTED] wrote:
  On Fri, Feb 01, 2008 at 07:43:54PM -0500, Mike Snitzer wrote:
   Hi Greg,
  
   Given your recent call for review of the next 2.6.22-stable I'd
   appreciate it if you and the rest of the stable team would strongly
   consider this SCSI IO error propagation fix for inclusion in 2.6.22.17
   (as well as the other stable trees).
 
  Is the patch in Linus's tree already?
 
  If so, what git commit id?  That's a requirement to go into the stable
  tree...
 
  If not, we need to wait until it is.
 
 Fair enough.  Thanks for the clarification.
 
 That likely leaves us with having James add it to his git tree for
 Linus (provided he agrees with the change).  Which he and Luben seemed
 to agree was the reasonable thin layer of protection that would be
 allowed for such out of spec SCSI devices.
 
 James, how would you like to proceed?  The patch fixes the good_bytes
 != 0 on HARDWARE_ERROR issue I have been chasing with Mark Salyzyn.

After it goes in, can someone notify [EMAIL PROTECTED] that it went in,
the git id, and that it should be applied to all older kernel trees?

thanks,

greg drowning in -stable patches k-h
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] sd: make error handling more robust (v2)

2008-02-02 Thread James Bottomley

On Fri, 2008-02-01 at 12:03 -0500, Tony Battersby wrote:
 This patch fixes a problem with some out-of-spec SCSI disks that report
 hardware or medium errors incorrectly.  Without the patch, the kernel
 may silently ignore a failed write command or return corrupted data on a
 failed read command.
 
 Signed-off-by: Tony Battersby [EMAIL PROTECTED]
 ---
 
 This is a simplified version of the original patch that fixes just the
 problem at hand, without trying to handle other theoretical out-of-spec
 cases.

Actually, to restore the original check, this is what we want, isn't it?
Ok, so I also made the sector division logic futureproof for the day we
have  4096 sector devices ...

James

---

From 5ae2e4a8ff095aab5997f17068d3e4212c33f039 Mon Sep 17 00:00:00 2001
From: Tony Battersby [EMAIL PROTECTED]
Date: Fri, 1 Feb 2008 12:03:27 -0500
Subject: [SCSI] sd: make error handling more robust

This patch fixes a problem with some out-of-spec SCSI disks that report
hardware or medium errors incorrectly.  Without the patch, the kernel
may silently ignore a failed write command or return corrupted data on a
failed read command.

Signed-off-by: Tony Battersby [EMAIL PROTECTED]
Cc: Stable Tree [EMAIL PROTECTED]
Signed-off-by: James Bottomley [EMAIL PROTECTED]
---
 drivers/scsi/sd.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

Index: BUILD-2.6/drivers/scsi/sd.c
===
--- BUILD-2.6.orig/drivers/scsi/sd.c2008-02-02 15:43:20.0 -0600
+++ BUILD-2.6/drivers/scsi/sd.c 2008-02-02 16:01:24.0 -0600
@@ -929,6 +929,7 @@
unsigned int xfer_size = scsi_bufflen(SCpnt);
unsigned int good_bytes = result ? 0 : xfer_size;
u64 start_lba = SCpnt-request-sector;
+   u64 end_lba = SCpnt-request-sector + (xfer_size / 512);
u64 bad_lba;
struct scsi_sense_hdr sshdr;
int sense_valid = 0;
@@ -967,26 +968,23 @@
goto out;
if (xfer_size = SCpnt-device-sector_size)
goto out;
-   switch (SCpnt-device-sector_size) {
-   case 256:
+   if (SCpnt-device-sector_size  512) {
+   /* only legitimate sector_size here is 256 */
start_lba = 1;
-   break;
-   case 512:
-   break;
-   case 1024:
-   start_lba = 1;
-   break;
-   case 2048:
-   start_lba = 2;
-   break;
-   case 4096:
-   start_lba = 3;
-   break;
-   default:
-   /* Print something here with limiting frequency. */
-   goto out;
-   break;
+   end_lba = 1;
+   } else {
+   /* be careful ... don't want any overflows */
+   u64 factor = SCpnt-device-sector_size / 512;
+   do_div(start_lba, factor);
+   do_div(end_lba, factor);
}
+
+   if (bad_lba  start_lba  || bad_lba = end_lba)
+   /* the bad lba was reported incorrectly, we have
+* no idea where the error is
+*/
+   goto out;
+
/* This computation should always be done in terms of
 * the resolution of the device's medium.
 */



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 9734] New: I/O error when inserting a second firewire sata disk

2008-02-02 Thread Stefan Richter
I wrote on 2008-01-13:
 James Bottomley wrote:
 Firewire list cc'd
 Jan 12 16:50:49 x3400 kernel: firewire_sbp2: orb reply timed out, rcode=0x11
 Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: 
 hostbyte=DID_BUS_BUSY
 driverbyte=DRIVER_OK,SUGGEST_OK
 Best I can tell, this is the source of the problem.  The sbp2 driver is
 replying DID_BUS_BUSY until that gets sorted out, which seems to be
 never.
 
 When something was plugged in or out at the same bus, fw-sbp2 has to
 reconnect == renew the login to each logical unit.  The syslog in the
 report is inconclusive whether that happened or failed.

In any case, there are frequently commands retried or newly enqueued
while fw-sbp2 waits to get the login renewed.  (And fw-sbp2 continues to
complete them with DID_BUS_BUSY until the reconnection didn't succeed.

Whoever caused that I/O, e.g. dd like in the reporter's and my own
tests, will quickly fail.

 As a side note, the old sbp2 driver does not quit commands with
 DID_BUS_BUSY between bus reset and reconnect.  Instead it blocks the
 Scsi_Host in order to not receive commands during that time at all.

I experimented with this yesterday.  First I tried
scsi_internal_device_block() because we
  - want to block logical units individually if possible,
  - need to block from within atomic context (softirq context).
However, this failed miserably with all sorts of lock inversion bug
backtraces (alleged ones or real ones, I don't know) and with occasional
kernel lock-ups (so it were probably real lock inversions).  These
locking issues cannot be solved easily because block layer and scsi_lib
play nauseating games with their locks.

So, I switched over to scsi_block_requests(), i.e. blocking the whole
host like the old sbp2 driver does.  This doesn't seem to have
scsi_internal_device_block()'s locking issues.  However, the sbp2 driver
has one Scsi_Host for each logical unit while the new fw-sbp2 driver
however has one Scsi_Host for each target.  Hence there are difficulties
with targets with multiple logical units, but I probably got them sorted
out now.

There remain frequent problems with reconnection + re-login failures
though.  These failures don't happen with exactly the same bus topology
if I don't run I/O during the bus resets.  I have an idea though what to
try next...
-- 
Stefan Richter
-=-==--- --=- ---==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html