lpfc build failure in latest git
CC [M] drivers/scsi/lpfc/lpfc_init.o drivers/scsi/lpfc/lpfc_init.c: In function ‘lpfc_pci_probe_one’: drivers/scsi/lpfc/lpfc_init.c:1897: error: implicit declaration of function ‘pci_enable_device_bars’ make[6]: *** [drivers/scsi/lpfc/lpfc_init.o] Error 1 seen on both x86 archs. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
[EMAIL PROTECTED] wrote on Wed, 30 Jan 2008 10:34 -0600: On Wed, 2008-01-30 at 09:38 +0100, Bart Van Assche wrote: On Jan 30, 2008 12:32 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote: iSER has parameters to limit the maximum size of RDMA (it needs to repeat RDMA with a poor configuration)? Please specify which parameters you are referring to. As you know I had already repeated my tests with ridiculously high values for the following iSER parameters: FirstBurstLength, MaxBurstLength and MaxRecvDataSegmentLength (16 MB, which is more than the 1 MB block size specified to dd). the 1Mb block size is a bit of a red herring. Unless you've specifically increased the max_sector_size and are using an sg_chain converted driver, on x86 the maximum possible transfer accumulation is 0.5MB. I certainly don't rule out that increasing the transfer size up from 0.5MB might be the way to improve STGT efficiency, since at an 1GB/s theoretical peak, that's roughly 2000 context switches per I/O; however, It doesn't look like you've done anything that will overcome the block layer limitations. The MRDSL parameter has no effect on iSER, as the RFC describes. How to transfer data to satisfy a command is solely up to the target. So you would need both big requests from the client, then look at how the target will send the data. I've only used 512 kB for the RDMA transfer size from the target, as it matches the default client size and was enough to get good performance out of my IB gear and minimizes resource consumption on the target. It's currently hard-coded as a #define. There is no provision in the protocol for the client to dictate the value. If others want to spend some effort trying to tune stgt for iSER, there are a fair number of comments in the code, including a big one that explains this RDMA transfer size issue. And I'll answer informed questions as I can. But I'm not particularly interested in arguing about which implementation is best, or trying to interpret bandwidth comparison numbers from poorly designed tests. It takes work to understand these issues. -- Pete - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
Ingo Molnar wrote: === --- linux.orig/drivers/scsi/lpfc/lpfc_init.c +++ linux/drivers/scsi/lpfc/lpfc_init.c @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, uint16_t iotag; int bars = pci_select_bars(pdev, IORESOURCE_MEM); - if (pci_enable_device_bars(pdev, bars)) + if (pci_enable_device_io(pdev)) goto out; Look at the line right above it... AFAICS you want pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs. Also a CC to linux-scsi and the driver author would be nice, as they are the ones with hardware and can verify. This set of changes seemed like 50% guesswork to me, without consulting the authors :( And unlike many changes, you actually have to know the hardware [or get clues from surrounding code] to make the change. Jeff - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lpfc build failure in latest git
On Sat, 2008-02-02 at 15:36 +0100, maximilian attems wrote: CC [M] drivers/scsi/lpfc/lpfc_init.o drivers/scsi/lpfc/lpfc_init.c: In function ‘lpfc_pci_probe_one’: drivers/scsi/lpfc/lpfc_init.c:1897: error: implicit declaration of function ‘pci_enable_device_bars’ make[6]: *** [drivers/scsi/lpfc/lpfc_init.o] Error 1 seen on both x86 archs. Oh, Greg was supposed to add this if he merged last. I'll send it through the SCSI tree. James Subject: git-scsi-misc vs gregkh-pci-pci-remove-users-of-pci_enable_device_bars From: Andrew Morton [EMAIL PROTECTED] Whoever merges last needs this. Cc: James Bottomley [EMAIL PROTECTED] Cc: Greg KH [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- drivers/scsi/lpfc/lpfc_init.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/scsi/lpfc/lpfc_init.c~git-scsi-misc-vs-gregkh-pci-pci-remove-users-of-pci_enable_device_bars drivers/scsi/lpfc/lpfc_init.c --- a/drivers/scsi/lpfc/lpfc_init.c~git-scsi-misc-vs-gregkh-pci-pci-remove-users-of-pci_enable_device_bars +++ a/drivers/scsi/lpfc/lpfc_init.c @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, uint16_t iotag; int bars = pci_select_bars(pdev, IORESOURCE_MEM); - if (pci_enable_device_bars(pdev, bars)) + if (pci_enable_device_mem(pdev)) goto out; if (pci_request_selected_regions(pdev, bars, LPFC_DRIVER_NAME)) goto out_disable_device; _ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
On Sat, 2008-02-02 at 10:51 -0500, Jeff Garzik wrote: Ingo Molnar wrote: === --- linux.orig/drivers/scsi/lpfc/lpfc_init.c +++ linux/drivers/scsi/lpfc/lpfc_init.c @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, uint16_t iotag; int bars = pci_select_bars(pdev, IORESOURCE_MEM); - if (pci_enable_device_bars(pdev, bars)) + if (pci_enable_device_io(pdev)) goto out; Look at the line right above it... AFAICS you want pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs. Also a CC to linux-scsi and the driver author would be nice, as they are the ones with hardware and can verify. This set of changes seemed like 50% guesswork to me, without consulting the authors :( And unlike many changes, you actually have to know the hardware [or get clues from surrounding code] to make the change. Jeff is right, this fix is wrong. The correct fix is here: http://marc.info/?l=linux-scsim=120196768605031 Ingo, could you please cc linux-scsi on your mails ... it would have been a complete disaster to have this incorrect patch go in. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
* Jeff Garzik [EMAIL PROTECTED] wrote: Ingo Molnar wrote: === --- linux.orig/drivers/scsi/lpfc/lpfc_init.c +++ linux/drivers/scsi/lpfc/lpfc_init.c @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, uint16_t iotag; int bars = pci_select_bars(pdev, IORESOURCE_MEM); - if (pci_enable_device_bars(pdev, bars)) +if (pci_enable_device_io(pdev)) goto out; Look at the line right above it... AFAICS you want pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs. Also a CC to linux-scsi and the driver author would be nice, as they are the ones with hardware and can verify. it would have been totally appropriate for me to just send a mail to lkml with the proper subject line about the breakage. (I might even have decided to stay completely silent about the issue and fix it for my own build, letting you guys figure it out.) Instead i did a search of lkml (based on the function name in the build error) and figured out where the pull request was on lkml: Greg. I replied to that mail, he'll obviously know whom else to Cc from that point on (if anyone). I really didnt want to (nor did i need to) figure out whether this was some general driver level API change that happen kernel-wide, or some SCSI specific change. I simply replied to the pull request whose Cc: line seemed well-populated to me already. I also took a look at the commit itself and did a quick hack in a hurry to keep the tests rolling. It really did not occur to me that i should have added anyone else to the Cc: line, as [EMAIL PROTECTED] was Cc:-ed already so i assumed the interest was from that angle. ( And as this was spent from my family's weekend time and i had no time and no interest to dig any further than to figure out the first hop of the change that broke the build, and the parties who initiated that hop. I'm in fact surprised that your and James's answer to my bugreport is hostility. ) So i find your suggestion that i should have added more people to the Cc: line unfair on several levels. This set of changes seemed like 50% guesswork to me, without consulting the authors :( And unlike many changes, you actually have to know the hardware [or get clues from surrounding code] to make the change. you mean the whole set of changes? Or just mine? Mine was a 30 seconds guesswork of course, i dont have the hardware, i didnt do the change, nor did i do anything near that code - i just saw the build failure stop my testing and sent in this notification and a hack. That's why i sent it to Greg, as a reply to the mail where he pushed these changes upstream and who'll know what to do with it from that point on. My patch can be totally thrown away (and should be, apparently). but ... i guess next time i'll think twice before sending any bugreports about or related to the SCSI code anywhere, unless they become really annoying. Who needs this hassle? Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
Ingo Molnar wrote: it would have been totally appropriate for me to just send a mail to lkml with the proper subject line about the breakage. (I might even have decided to stay completely silent about the issue and fix it for my own build, letting you guys figure it out.) Oh come on... You are smart enough to know to at least CC the driver maintainer, the key POC who should be aware of breakage of their driver. That is a standard courtesy. ( And as this was spent from my family's weekend time and i had no time and no interest to dig any further than to figure out the first hop of the change that broke the build, and the parties who initiated that hop. I'm in fact surprised that your and James's answer to my bugreport is hostility. ) I'm sorry you read would be nice as hostility. So i find your suggestion that i should have added more people to the Cc: line unfair on several levels. As I noted, it is an obvious courtesy to CC the driver maintainer, at the very least. _Especially_ when it is a change that requires some knowledge of the hardware, as was this case. This set of changes seemed like 50% guesswork to me, without consulting the authors :( And unlike many changes, you actually have to know the hardware [or get clues from surrounding code] to make the change. you mean the whole set of changes? The whole set of changes, yes, not just yours. but ... i guess next time i'll think twice before sending any bugreports about or related to the SCSI code anywhere, unless they become really annoying. Who needs this hassle? The fact is, each larger subsystem (net, scsi, ata I know) has several vendor contacts and driver maintainers who for various reasons prefer a more focused -- and often less hostile -- mailing list to LKML. I have a hard enough time as it is, trying to convince hardware vendors to work with us, that we are not all assholes. How about respecting the preferences of certain segments of a very large community, even when they differ from your own? We have a community-accepted method of expressing these preferences, the MAINTAINERS file. IMO, standard practice should be: * To or CC: driver maintainer mentioned in MAINTAINERS (if any) * CC: LKML, any list mentioned in MAINTAINERS So, how about CC'ing the targets that have nicely requested to be CC'd? Jeff - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
* Jeff Garzik [EMAIL PROTECTED] wrote: Ingo Molnar wrote: it would have been totally appropriate for me to just send a mail to lkml with the proper subject line about the breakage. (I might even have decided to stay completely silent about the issue and fix it for my own build, letting you guys figure it out.) Oh come on... You are smart enough to know to at least CC the driver maintainer, the key POC who should be aware of breakage of their driver. That is a standard courtesy. is there any particular reason why you cut out the most relevant part of my reply, which happens to answer all your questions AFAICS: Instead i did a search of lkml (based on the function name in the build error) and figured out where the pull request was on lkml: Greg. I replied to that mail, he'll obviously know whom else to Cc from that point on (if anyone). I really didnt want to (nor did i need to) figure out whether this was some general driver level API change that happen kernel-wide, or some SCSI specific change. I simply replied to the pull request whose Cc: line seemed well-populated to me already. I also took a look at the commit itself and did a quick hack in a hurry to keep the tests rolling. It really did not occur to me that i should have added anyone else to the Cc: line, as [EMAIL PROTECTED] was Cc:-ed already so i assumed the interest was from that angle. had you read this portion you'd have realized that i did not search for any owner of the file, i simply searched for the person who introduced the change, and the on-lkml mail where the change was introduced. and that's all that should be needed, really. Believe me, i hit tons of bugs all across the kernel, often several bugs a day, and it's hard even for me to figure out who maintains a file and when. (and in Linux there's no ownership of files anyway) So as a general rule i go after changes instead, and that's exactly what i did here too. I do delta/regression QA - i.e. i watch for _changes_ that break the kernel and hence the general 'owner' of a file is often irrelevant - it's the maintainer who introduces a change who matters, and we do lots of cross-maintain merges. Only if i do not manage to identify a change do i try to figure out who maintains a file at that given moment. (But those mails often go into black holes, they get bounced, subscriber-required email lists, etc. etc.) It's also nontrivial to map the files to the MAINTAINERS file, and it's also quite outdated in some portions. So the MAINTAINERS file is the last resort i use. so i'm still totally befuddled why you think that there was anything particularly wrong or unhelpful about me replying to the specific pull request that introduced a particular breakage into the kernel. Had i mailed to lkml with a terse kernel build broke message with just an URL to a config and the build breakage, you could rightfully have complained that i should have done more to properly direct my bugreport. But this breakage was about a PCI API change, the pull request had a PCI mailing list Cc:-ed, why should i have thought that this needs the attention of any other parties? Ingo - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
On Sat, 2008-02-02 at 18:08 +0100, Ingo Molnar wrote: * Jeff Garzik [EMAIL PROTECTED] wrote: Ingo Molnar wrote: === --- linux.orig/drivers/scsi/lpfc/lpfc_init.c +++ linux/drivers/scsi/lpfc/lpfc_init.c @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev, uint16_t iotag; int bars = pci_select_bars(pdev, IORESOURCE_MEM); - if (pci_enable_device_bars(pdev, bars)) + if (pci_enable_device_io(pdev)) goto out; Look at the line right above it... AFAICS you want pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs. Also a CC to linux-scsi and the driver author would be nice, as they are the ones with hardware and can verify. it would have been totally appropriate for me to just send a mail to lkml with the proper subject line about the breakage. (I might even have decided to stay completely silent about the issue and fix it for my own build, letting you guys figure it out.) We agreed to differ a while ago on your head in the sand, post it to LKML because everybody follows that list attitude. Some nice soul will always (eventually) forward to the correct list, so this practice more or less works (just not in as timely fashion as actually posting to the relevant list). Instead i did a search of lkml (based on the function name in the build error) and figured out where the pull request was on lkml: Greg. I replied to that mail, he'll obviously know whom else to Cc from that point on (if anyone). I really didnt want to (nor did i need to) figure out whether this was some general driver level API change that happen kernel-wide, or some SCSI specific change. I simply replied to the pull request whose Cc: line seemed well-populated to me already. I also took a look at the commit itself and did a quick hack in a hurry to keep the tests rolling. It really did not occur to me that i should have added anyone else to the Cc: line, as [EMAIL PROTECTED] was Cc:-ed already so i assumed the interest was from that angle. ( And as this was spent from my family's weekend time and i had no time and no interest to dig any further than to figure out the first hop of the change that broke the build, and the parties who initiated that hop. I'm in fact surprised that your and James's answer to my bugreport is hostility. ) What's hostile about telling you your patch is wrong and pointing you at the correct one? So i find your suggestion that i should have added more people to the Cc: line unfair on several levels. This set of changes seemed like 50% guesswork to me, without consulting the authors :( And unlike many changes, you actually have to know the hardware [or get clues from surrounding code] to make the change. you mean the whole set of changes? Or just mine? Mine was a 30 seconds guesswork of course, i dont have the hardware, i didnt do the change, nor did i do anything near that code - i just saw the build failure stop my testing and sent in this notification and a hack. That's why i sent it to Greg, as a reply to the mail where he pushed these changes upstream and who'll know what to do with it from that point on. My patch can be totally thrown away (and should be, apparently). but ... i guess next time i'll think twice before sending any bugreports about or related to the SCSI code anywhere, unless they become really annoying. Who needs this hassle? Oh good grief, don't come the raw prawn with us! You're a kernel maintainer, you are expected to know the rules and follow them. The very fact that a well known maintainer posts a patch with a signed-off-by to Andrew and Linus means that it likely gets applied. Because of this, maintainers and other people with similarly trusted positions are expected to go via the lists and subsystems in part as general courtesy, but also to verify that their patch is actually valid before it gets applied. Are you seriously telling us that it required too much investigation on your part to figure out that something with a compile failure in drivers/scsi might belong on the scsi list? Let me tell you exactly what would have happened if that patch had been applied: Because the wrong bars were enabled, the device would have attached but been non-functional. Likely Emulex would have picked this up in their -rc1 testing and spent several days trying to trace the cause in their labs. Chances are, that, because this type of breakage is extremely subtle, they wouldn't have noticed the fact that the enable should have been _mem not _io. Then they would have raised the issue to linux-scsi. It would probably take at least a person week of effort before anyone finally noticed what the issue was. If you can't see that your behaviour needs modifying, I suggest you seriously consider your position. For all our talk of a nice utopian democracy of meritocracy in Open
Re: [patch] pci: pci_enable_device_bars() fix
Ingo Molnar wrote: * Jeff Garzik [EMAIL PROTECTED] wrote: Ingo Molnar wrote: it would have been totally appropriate for me to just send a mail to lkml with the proper subject line about the breakage. (I might even have decided to stay completely silent about the issue and fix it for my own build, letting you guys figure it out.) Oh come on... You are smart enough to know to at least CC the driver maintainer, the key POC who should be aware of breakage of their driver. That is a standard courtesy. is there any particular reason why you cut out the most relevant part of my reply, which happens to answer all your questions AFAICS: Instead i did a search of lkml (based on the function name in the build error) and figured out where the pull request was on lkml: Greg. I replied to that mail, he'll obviously know whom else to Cc from that point on (if anyone). I really didnt want to (nor did i need to) figure out whether this was some general driver level API change that happen kernel-wide, or some SCSI specific change. I simply replied to the pull request whose Cc: line seemed well-populated to me already. I also took a look at the commit itself and did a quick hack in a hurry to keep the tests rolling. It really did not occur to me that i should have added anyone else to the Cc: line, as [EMAIL PROTECTED] was Cc:-ed already so i assumed the interest was from that angle. had you read this portion you'd have realized that i did not search for any owner of the file, i simply searched for the person who introduced the change, and the on-lkml mail where the change was introduced. And therein lies the problem. The original submittor omitted relevant maintainers, you followed that [incorrect] example, and the end result was clear: an obviously wrong change. Thus, the problem is precisely what you stated: you did not bother to search for people who care about that file. and that's all that should be needed, really. Believe me, i hit tons of Hardly. What part of this change requires knowledge of the hardware is difficult to understand? And if you do not have that knowledge, why is it so trying to CC people who actively maintain a driver, and have that knowledge? It's simple common sense to -ask- or at least -notify- in such cases. That the original submittor made the same mistake is no reason to repeat the mistake. bugs all across the kernel, often several bugs a day, and it's hard even for me to figure out who maintains a file and when. (and in Linux there's no ownership of files anyway) So as a general rule i go after changes instead, and that's exactly what i did here too. I do delta/regression QA - i.e. i watch for _changes_ that break the kernel and hence the general 'owner' of a file is often irrelevant - it's the maintainer who introduces a change who matters, and we do lots of cross-maintain merges. Only if i do not manage to identify a change do i try to figure out who maintains a file at that given moment. (But those mails often go into black holes, they get bounced, subscriber-required email lists, etc. etc.) It's also nontrivial to map the files to the MAINTAINERS file, and it's also quite outdated in some portions. So the MAINTAINERS file is the last resort i use. That's a long list of excuses in an attempt to ignore the facts: Fact 1: The driver you modified is actively maintained Fact 2: The driver maintainer has respectfully indicated, through the standard community mechanism, the useful points of contact. Fact 3: The MAINTAINERS entry is correct and up-to-date. Fact 4: Even if you wanted to ignore MAINTAINERS, 'git log' on the relevant file could have told you who are useful parties to CC. It's just common courtesy to CC a driver maintainer, ESPECIALLY when a change requires knowledge of the driver/hardware in question. so i'm still totally befuddled why you think that there was anything particularly wrong or unhelpful about me replying to the specific pull request that introduced a particular breakage into the kernel. Had i mailed to lkml with a terse kernel build broke message with just an URL to a config and the build breakage, you could rightfully have complained that i should have done more to properly direct my bugreport. But this breakage was about a PCI API change, the pull request had a PCI mailing list Cc:-ed, why should i have thought that this needs the attention of any other parties? Because the change required knowledge not only of PCI, but of the hardware in question. As your patch demonstrated. And yes -- the original changes should have been CC'd to interested parties as well. I'm still waiting to hear back from Alan or Bart whether the ATA/IDE changes in that PCI pile actually work... the original changeset even noted that relevant parties had not yet been queried. Jeff - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message
Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
On Fri, Feb 01, 2008 at 07:43:54PM -0500, Mike Snitzer wrote: Hi Greg, Given your recent call for review of the next 2.6.22-stable I'd appreciate it if you and the rest of the stable team would strongly consider this SCSI IO error propagation fix for inclusion in 2.6.22.17 (as well as the other stable trees). Is the patch in Linus's tree already? If so, what git commit id? That's a requirement to go into the stable tree... If not, we need to wait until it is. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] pci: pci_enable_device_bars() fix
Ingo Molnar wrote: so please tell me Jeff. If Greg, who is the super-maintainer of your code area, and who deals with your code every day and changes it every minute and hour, simply did not Cc: the SCSI list - how am i, a largely outside party in this matter, supposed to notice that 3 maintainers and 3 mailing lists in the Cc: were somehow not enough and that i was supposed to grow the already sizable Cc: list even more? Because, regardless of the situation, it's both common courtesy and wise practice to CC relevant driver maintainers, when you touch a driver. And it's just common sense: Greg simply does not know the intimate details of every PCI driver. Nor do I. Nor you. In the case of lpfc here, we have an active driver maintainer, and an up-to-date MAINTAINERS entry. Even if you are too slack to read MAINTAINERS, 'git log' would have given you the same info. Don't pretend there is some benefit here to ignoring the people that best know the driver. I don't buy that; it simply makes no engineering sense whatsoever. Jeff - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
On Sat, Feb 02, 2008 at 03:56:30PM -0500, Mike Snitzer wrote: On Feb 2, 2008 3:24 PM, Greg KH [EMAIL PROTECTED] wrote: On Fri, Feb 01, 2008 at 07:43:54PM -0500, Mike Snitzer wrote: Hi Greg, Given your recent call for review of the next 2.6.22-stable I'd appreciate it if you and the rest of the stable team would strongly consider this SCSI IO error propagation fix for inclusion in 2.6.22.17 (as well as the other stable trees). Is the patch in Linus's tree already? If so, what git commit id? That's a requirement to go into the stable tree... If not, we need to wait until it is. Fair enough. Thanks for the clarification. That likely leaves us with having James add it to his git tree for Linus (provided he agrees with the change). Which he and Luben seemed to agree was the reasonable thin layer of protection that would be allowed for such out of spec SCSI devices. James, how would you like to proceed? The patch fixes the good_bytes != 0 on HARDWARE_ERROR issue I have been chasing with Mark Salyzyn. After it goes in, can someone notify [EMAIL PROTECTED] that it went in, the git id, and that it should be applied to all older kernel trees? thanks, greg drowning in -stable patches k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
On Fri, 2008-02-01 at 12:03 -0500, Tony Battersby wrote: This patch fixes a problem with some out-of-spec SCSI disks that report hardware or medium errors incorrectly. Without the patch, the kernel may silently ignore a failed write command or return corrupted data on a failed read command. Signed-off-by: Tony Battersby [EMAIL PROTECTED] --- This is a simplified version of the original patch that fixes just the problem at hand, without trying to handle other theoretical out-of-spec cases. Actually, to restore the original check, this is what we want, isn't it? Ok, so I also made the sector division logic futureproof for the day we have 4096 sector devices ... James --- From 5ae2e4a8ff095aab5997f17068d3e4212c33f039 Mon Sep 17 00:00:00 2001 From: Tony Battersby [EMAIL PROTECTED] Date: Fri, 1 Feb 2008 12:03:27 -0500 Subject: [SCSI] sd: make error handling more robust This patch fixes a problem with some out-of-spec SCSI disks that report hardware or medium errors incorrectly. Without the patch, the kernel may silently ignore a failed write command or return corrupted data on a failed read command. Signed-off-by: Tony Battersby [EMAIL PROTECTED] Cc: Stable Tree [EMAIL PROTECTED] Signed-off-by: James Bottomley [EMAIL PROTECTED] --- drivers/scsi/sd.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) Index: BUILD-2.6/drivers/scsi/sd.c === --- BUILD-2.6.orig/drivers/scsi/sd.c2008-02-02 15:43:20.0 -0600 +++ BUILD-2.6/drivers/scsi/sd.c 2008-02-02 16:01:24.0 -0600 @@ -929,6 +929,7 @@ unsigned int xfer_size = scsi_bufflen(SCpnt); unsigned int good_bytes = result ? 0 : xfer_size; u64 start_lba = SCpnt-request-sector; + u64 end_lba = SCpnt-request-sector + (xfer_size / 512); u64 bad_lba; struct scsi_sense_hdr sshdr; int sense_valid = 0; @@ -967,26 +968,23 @@ goto out; if (xfer_size = SCpnt-device-sector_size) goto out; - switch (SCpnt-device-sector_size) { - case 256: + if (SCpnt-device-sector_size 512) { + /* only legitimate sector_size here is 256 */ start_lba = 1; - break; - case 512: - break; - case 1024: - start_lba = 1; - break; - case 2048: - start_lba = 2; - break; - case 4096: - start_lba = 3; - break; - default: - /* Print something here with limiting frequency. */ - goto out; - break; + end_lba = 1; + } else { + /* be careful ... don't want any overflows */ + u64 factor = SCpnt-device-sector_size / 512; + do_div(start_lba, factor); + do_div(end_lba, factor); } + + if (bad_lba start_lba || bad_lba = end_lba) + /* the bad lba was reported incorrectly, we have +* no idea where the error is +*/ + goto out; + /* This computation should always be done in terms of * the resolution of the device's medium. */ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 9734] New: I/O error when inserting a second firewire sata disk
I wrote on 2008-01-13: James Bottomley wrote: Firewire list cc'd Jan 12 16:50:49 x3400 kernel: firewire_sbp2: orb reply timed out, rcode=0x11 Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK Best I can tell, this is the source of the problem. The sbp2 driver is replying DID_BUS_BUSY until that gets sorted out, which seems to be never. When something was plugged in or out at the same bus, fw-sbp2 has to reconnect == renew the login to each logical unit. The syslog in the report is inconclusive whether that happened or failed. In any case, there are frequently commands retried or newly enqueued while fw-sbp2 waits to get the login renewed. (And fw-sbp2 continues to complete them with DID_BUS_BUSY until the reconnection didn't succeed. Whoever caused that I/O, e.g. dd like in the reporter's and my own tests, will quickly fail. As a side note, the old sbp2 driver does not quit commands with DID_BUS_BUSY between bus reset and reconnect. Instead it blocks the Scsi_Host in order to not receive commands during that time at all. I experimented with this yesterday. First I tried scsi_internal_device_block() because we - want to block logical units individually if possible, - need to block from within atomic context (softirq context). However, this failed miserably with all sorts of lock inversion bug backtraces (alleged ones or real ones, I don't know) and with occasional kernel lock-ups (so it were probably real lock inversions). These locking issues cannot be solved easily because block layer and scsi_lib play nauseating games with their locks. So, I switched over to scsi_block_requests(), i.e. blocking the whole host like the old sbp2 driver does. This doesn't seem to have scsi_internal_device_block()'s locking issues. However, the sbp2 driver has one Scsi_Host for each logical unit while the new fw-sbp2 driver however has one Scsi_Host for each target. Hence there are difficulties with targets with multiple logical units, but I probably got them sorted out now. There remain frequent problems with reconnection + re-login failures though. These failures don't happen with exactly the same bus topology if I don't run I/O during the bus resets. I have an idea though what to try next... -- Stefan Richter -=-==--- --=- ---== http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html