Re: SCSI breakage on non-cache coherent architectures
On Tue, Nov 20, 2007 at 06:51:14AM +1100, Benjamin Herrenschmidt wrote: On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100 I'm not sure what is the best way to fix that. Internally, I've done some test whacking some cacheline_aligned in the scsi_cmnd data structure to verify I no longer get random SLAB corruption when using my USB but that significantly bloats the size of the structure on archs such as ppc64 that don't need it and have a large cache line size. Unfortunately, I don't think there's any existing Kconfig symbol or arch provided #define to tell us that we are on a non-coherent arch afaik that could be used to make that conditional. Another option would be to kmalloc the buffer (wasn't it the case before btw ?) but I suppose some people will scream at the idea due to how the command pools are done... You could make a dma_cacheline_aligned and use that. It seems pretty reasonable. I was thinking about that. What archs would need it ? arm, mips, what else ? older parisc Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessary a good idea.[ RFC1925, 2.3 ] - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Tue, 2007-11-20 at 09:29 +0100, Thomas Bogendoerfer wrote: On Tue, Nov 20, 2007 at 06:51:14AM +1100, Benjamin Herrenschmidt wrote: On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100 I'm not sure what is the best way to fix that. Internally, I've done some test whacking some cacheline_aligned in the scsi_cmnd data structure to verify I no longer get random SLAB corruption when using my USB but that significantly bloats the size of the structure on archs such as ppc64 that don't need it and have a large cache line size. Unfortunately, I don't think there's any existing Kconfig symbol or arch provided #define to tell us that we are on a non-coherent arch afaik that could be used to make that conditional. Another option would be to kmalloc the buffer (wasn't it the case before btw ?) but I suppose some people will scream at the idea due to how the command pools are done... You could make a dma_cacheline_aligned and use that. It seems pretty reasonable. I was thinking about that. What archs would need it ? arm, mips, what else ? older parisc Actually, we already established on IRC that the lasi700 driver doesn't need this, principally because the parisc architecture doesn't do an invalidate for DMA_FROM_DEVICE but a flush and invalidate (architecturally, if you read our manuals, even pdc is entitled to write back dirty lines, so it's not clear there's actually an invalidate instruction we can use). This is also one possible temporary fix for the other architectures if we can't get a different method to work nicely. James James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Tue, 2007-11-20 at 14:14 +1100, Benjamin Herrenschmidt wrote: FYI, Here's what I have for the SCSI change. I haven't updated drivers to care for the new return code though, help appreciated with that as I don't know much about these drivers. It looks to me like the return problem could be solved by passing in the buffer to be used for sense; that way we can allocate it in hostdata at init time for all the drivers and scsi_error can allocate and free a page using GFP_KERNEL. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
Actually, we already established on IRC that the lasi700 driver doesn't need this, principally because the parisc architecture doesn't do an invalidate for DMA_FROM_DEVICE but a flush and invalidate (architecturally, if you read our manuals, even pdc is entitled to write back dirty lines, so it's not clear there's actually an invalidate instruction we can use). This is also one possible temporary fix for the other architectures if we can't get a different method to work nicely. I think doing a writeback and invalidate is a very fragile way to deal with DMA into the middle of a data structure. It may work OK for now, but you have to make sure forever into the future that no codepath anywhere else ever touches the cacheline that you're DMAing into while the DMA is pending. It just leaves a hidden trap that is too easy to step on, because the architectures that get pretty much all testing all have cache-coherent DMA. Reviving my ancient __dma_buffer patch seems far preferable to me. - R. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Tue, 2007-11-20 at 12:05 -0800, Roland Dreier wrote: Actually, we already established on IRC that the lasi700 driver doesn't need this, principally because the parisc architecture doesn't do an invalidate for DMA_FROM_DEVICE but a flush and invalidate (architecturally, if you read our manuals, even pdc is entitled to write back dirty lines, so it's not clear there's actually an invalidate instruction we can use). This is also one possible temporary fix for the other architectures if we can't get a different method to work nicely. I think doing a writeback and invalidate is a very fragile way to deal with DMA into the middle of a data structure. It may work OK for now, but you have to make sure forever into the future that no codepath anywhere else ever touches the cacheline that you're DMAing into while the DMA is pending. It just leaves a hidden trap that is too easy to step on, because the architectures that get pretty much all testing all have cache-coherent DMA. Reviving my ancient __dma_buffer patch seems far preferable to me. We're talking about trying to fix this for 2.4; which is already at -rc3 ... Is an entire arch change for dma alignment really a merge candidate at this stage? James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Tue, 2007-11-20 at 15:10 -0600, James Bottomley wrote: We're talking about trying to fix this for 2.4; which is already at -rc3 ... Is an entire arch change for dma alignment really a merge candidate at this stage? Well, as I said before... it's a matter of what seems to be the less likely to break something right ? On one side, I'm doing surgery on code I barely know, the scsi error handling, and now it seems I also have to fixup a handful of drivers that aren't the most obvious pieces of code around. On the other side, Roland proposal is basically just adding a macro that can be empty for everybody but a handful of archs, and stick it onto one field in one structure... The later has about 0 chances to actually break something or cause a regression. I wouldn't say that about the former. Now, I will see if I manage to fixup the NCR drivers to pass a pre-allocated buffer (USB storage I think can pass NULL as it's not calling prep in atomic context). But then, it complicates the matter because that means restore will have to know whether prep allocated the buffer or not, thus more fields to add to the save struct, it's getting messy, unless we decide -all- callers are responsible for the buffer allocation (hrm... maybe the best approach). Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Wed, 2007-11-21 at 09:39 +1100, Benjamin Herrenschmidt wrote: On Tue, 2007-11-20 at 15:10 -0600, James Bottomley wrote: We're talking about trying to fix this for 2.4; which is already at -rc3 ... Is an entire arch change for dma alignment really a merge candidate at this stage? Well, as I said before... it's a matter of what seems to be the less likely to break something right ? On one side, I'm doing surgery on code I barely know, the scsi error handling, and now it seems I also have to fixup a handful of drivers that aren't the most obvious pieces of code around. On the other side, Roland proposal is basically just adding a macro that can be empty for everybody but a handful of archs, and stick it onto one field in one structure... Yes ... it's the getting arch owner agreement to send the patch that slightly worries me. The later has about 0 chances to actually break something or cause a regression. I wouldn't say that about the former. Now, I will see if I manage to fixup the NCR drivers to pass a pre-allocated buffer (USB storage I think can pass NULL as it's not calling prep in atomic context). But then, it complicates the matter because that means restore will have to know whether prep allocated the buffer or not, thus more fields to add to the save struct, it's getting messy, unless we decide -all- callers are responsible for the buffer allocation (hrm... maybe the best approach). Sorry, yes, that's what I was thinking ... identically to the way the struct scsi_eh_save is handled ... or indeed as an extra pointer field inside scsi_eh_save. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100 I'm not sure what is the best way to fix that. Internally, I've done some test whacking some cacheline_aligned in the scsi_cmnd data structure to verify I no longer get random SLAB corruption when using my USB but that significantly bloats the size of the structure on archs such as ppc64 that don't need it and have a large cache line size. Unfortunately, I don't think there's any existing Kconfig symbol or arch provided #define to tell us that we are on a non-coherent arch afaik that could be used to make that conditional. Another option would be to kmalloc the buffer (wasn't it the case before btw ?) but I suppose some people will scream at the idea due to how the command pools are done... You could make a dma_cacheline_aligned and use that. It seems pretty reasonable. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote: The other one I'm hitting now is that the SCSI layer nowadays embeds the 'nowadays'? It has always been so. sense_buffer inside the scsi_cmnd structure without any kind of alignment whatsoever. I've been hitting irregulary is a crash on SCSI command completion that seems to be related to corruption of the request pointer in struct scsi_cmnd and I think it might be the cause. I'm now trying to setup a proper repro-case. What other drivers do is DMA to their own allocation and then memcpy to the sense buffer. There is a movement to allocate the sense data as its own sg list, but I don't think that patch has even been posted yet. -- Intel are signing my paycheques ... these opinions are still mine Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 05:32 -0700, Matthew Wilcox wrote: On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote: The other one I'm hitting now is that the SCSI layer nowadays embeds the 'nowadays'? It has always been so. sense_buffer inside the scsi_cmnd structure without any kind of alignment whatsoever. I've been hitting irregulary is a crash on SCSI command completion that seems to be related to corruption of the request pointer in struct scsi_cmnd and I think it might be the cause. I'm now trying to setup a proper repro-case. What other drivers do is DMA to their own allocation and then memcpy to the sense buffer. There is a movement to allocate the sense data as its own sg list, but I don't think that patch has even been posted yet. I'd like to be rid of it inside the command for various reasons: every command has one of these, and they're expensive in the allocation (at 96 bytes). There's no reason we have to allocate and free that amount of space with every command. In theory, the number of these is bounded at the queue depth, in practice, there's usually only one, and this DMA alignment issue does requires most drivers to double copy. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100 I'm not sure what is the best way to fix that. Internally, I've done some test whacking some cacheline_aligned in the scsi_cmnd data structure to verify I no longer get random SLAB corruption when using my USB but that significantly bloats the size of the structure on archs such as ppc64 that don't need it and have a large cache line size. Unfortunately, I don't think there's any existing Kconfig symbol or arch provided #define to tell us that we are on a non-coherent arch afaik that could be used to make that conditional. Another option would be to kmalloc the buffer (wasn't it the case before btw ?) but I suppose some people will scream at the idea due to how the command pools are done... You could make a dma_cacheline_aligned and use that. It seems pretty reasonable. I was thinking about that. What archs would need it ? arm, mips, what else ? Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 05:32 -0700, Matthew Wilcox wrote: On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote: The other one I'm hitting now is that the SCSI layer nowadays embeds the 'nowadays'? It has always been so. Wasn't it kmalloc'ed at one point ? sense_buffer inside the scsi_cmnd structure without any kind of alignment whatsoever. I've been hitting irregulary is a crash on SCSI command completion that seems to be related to corruption of the request pointer in struct scsi_cmnd and I think it might be the cause. I'm now trying to setup a proper repro-case. What other drivers do is DMA to their own allocation and then memcpy to the sense buffer. What other drivers ? Those architectures use the same drivers as everything else. There is a movement to allocate the sense data as its own sg list, but I don't think that patch has even been posted yet. I've seen code creating an sglist from the scsi_cmnd-sense_buffer and passing that to drivers. That breaks. Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 09:09 -0600, James Bottomley wrote: What other drivers do is DMA to their own allocation and then memcpy to the sense buffer. There is a movement to allocate the sense data as its own sg list, but I don't think that patch has even been posted yet. I'd like to be rid of it inside the command for various reasons: every command has one of these, and they're expensive in the allocation (at 96 bytes). There's no reason we have to allocate and free that amount of space with every command. In theory, the number of these is bounded at the queue depth, in practice, there's usually only one, and this DMA alignment issue does requires most drivers to double copy. And most drivers don't and break. Take USB storage, I -think- (code path aren't trivial to follow) it just gets the sglist cooked up by the code in scsi_error.c no ? That just points to the buffer in scsi_cmnd. It then pass that for DMA to the USB stack. Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
I'd like to be rid of it inside the command for various reasons: every command has one of these, and they're expensive in the allocation (at 96 bytes). There's no reason we have to allocate and free that amount of space with every command. In theory, the number of these is bounded at the queue depth, in practice, there's usually only one, and this DMA alignment issue does requires most drivers to double copy. Do you have a plan for short term here ? I'd like something fixed for .25, so I may have to introduce a __dma_aligned macro of some sort to deal with that in the meantime... Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
I've been debugging various issues on the PowerPC 44x embedded architecture which happens to have non-coherent PCI DMA. One of the problem I'm hitting is that one really need to enforce kmalloc alignement to cache lines or bad things will happen (among others with USB), for some reasons, powerpc failed to do so, I fixed it. Heh... I hit the same problem literally 5 years ago: http://lwn.net/Articles/1783/ I implemented the __dma_buffer annotation: http://lwn.net/Articles/2269/ But DaveM said we should just use the PCI pool code instead: http://lwn.net/Articles/2270/ - R. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 13:43 -0800, Roland Dreier wrote: I've been debugging various issues on the PowerPC 44x embedded architecture which happens to have non-coherent PCI DMA. One of the problem I'm hitting is that one really need to enforce kmalloc alignement to cache lines or bad things will happen (among others with USB), for some reasons, powerpc failed to do so, I fixed it. Heh... I hit the same problem literally 5 years ago: http://lwn.net/Articles/1783/ I implemented the __dma_buffer annotation: http://lwn.net/Articles/2269/ But DaveM said we should just use the PCI pool code instead: http://lwn.net/Articles/2270/ Heh, well... In this case, DaveM just proposed something akin to your __dma_buffer :-) On the other hand, after discussing with James, it looks like we'll just be reverting the patch that removed the kmalloc of the sense buffer since non cache-coherent archs are supposed to enforce kmalloc alignment to cache lines. __dma_buffer still seems like a good thing to have if too many other drivers are hitting that but for this specific problem, it's not the approach that we'll be taking. Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Tue, 20 Nov 2007 06:51:14 +1100 On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100 You could make a dma_cacheline_aligned and use that. It seems pretty reasonable. I was thinking about that. What archs would need it ? arm, mips, what else ? The sparc32 port would need it too. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 14:31 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Tue, 20 Nov 2007 06:51:14 +1100 On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote: From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Mon, 19 Nov 2007 16:35:23 +1100 You could make a dma_cacheline_aligned and use that. It seems pretty reasonable. I was thinking about that. What archs would need it ? arm, mips, what else ? The sparc32 port would need it too. James preference seem to go for a revert of the patch that removed the kmalloc of the buffer instead. Sounds definitely like an easier plan for .24 (and maybe even backport to stable). I'll produce a patch for that later today or tomorrow. Do you still think we should introduce this __dma_cacheline_aligned ? Do you see other cases of drivers where it would be useful ? It tend to agree with your earlier statement that drivers doing that are broken and should be using a separate allocator for DMA'ble objects (in fact, on non-cache coherent archs, kmalloc is just fine). Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 16:46 -0800, David Miller wrote: 1) Require that entire buffers are commited by call sites, and thus embedding DMA'd within non-DMA stuff isn't allowed 2) Add the __dma_cacheline_aligned tag. But note that with #2 it could get quite ugly because the alignment and size both have a minimum that needs to be enforced, not just the alignment alone. So either: Yup. struct foo { unsigned int other_unrelated_stuff; struct object dma_thing __dma_cacheline_aligned; unsigned int more_nondma_stuff __dma_cacheline_aligned; }; In my tests, I had used a fuckton_t object defined to be an empty thing with alignment constraint, seemed to work :-) But I'd rather require #1. BTW. What is the status nowadays with skb's ? Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
From: Benjamin Herrenschmidt [EMAIL PROTECTED] Date: Tue, 20 Nov 2007 11:55:01 +1100 BTW. What is the status nowadays with skb's ? Good question. Some drivers are problematic (or were) because they put DMA descriptor chaining information at the head of the buffer, but those have been fixed either to use alternate descriptor ring facilities or make the proper DMA sync calls. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
2) Add the __dma_cacheline_aligned tag. But note that with #2 it could get quite ugly because the alignment and size both have a minimum that needs to be enforced, not just the alignment alone. So either: struct foo { unsigned int other_unrelated_stuff; struct object dma_thing __dma_cacheline_aligned; unsigned int more_nondma_stuff __dma_cacheline_aligned; }; or: struct foo { unsigned int other_unrelated_stuff; union { struct object dma_thing __dma_cacheline_aligned; char __pad[(sizeof(object) + DMA_CACHELINE_SIZE ~DMA_CACHELINE_SIZE)]; } u; unsigned int more_nondma_stuff __dma_cacheline_aligned; }; I wrapped this ugliness up inside the macro back in what I posted in 2002 (http://lkml.org/lkml/2002/6/12/234): #define __dma_buffer __dma_buffer_line(__LINE__) #define __dma_buffer_line(line) __dma_buffer_expand_line(line) #define __dma_buffer_expand_line(line) \ __attribute__ ((aligned(L1_CACHE_BYTES))); \ char __dma_pad_ ## line [0] __attribute__ ((aligned(L1_CACHE_BYTES))) then you just need to tag the actual member like: char foo[3] __dma_buffer; - R. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
On Mon, 2007-11-19 at 18:10 -0800, Roland Dreier wrote: I wrapped this ugliness up inside the macro back in what I posted in 2002 (http://lkml.org/lkml/2002/6/12/234): #define __dma_buffer __dma_buffer_line(__LINE__) #define __dma_buffer_line(line) __dma_buffer_expand_line(line) #define __dma_buffer_expand_line(line) \ __attribute__ ((aligned(L1_CACHE_BYTES))); \ char __dma_pad_ ## line [0] __attribute__ ((aligned(L1_CACHE_BYTES))) then you just need to tag the actual member like: char foo[3] __dma_buffer; That's actually not too bad ... I'm having a problem with reverting SCSI to use an allocation for the sense buffer, because it can fail and the new scso_eh_prep_cmnd() isn't supposed to return a failure. I've changed that but now I get into trying to fix the various drivers that use it without checking the result code and it's becoming much more complicated than initially thought. So I may do the above instead and revive your patch. Any objection ? James ? David ? Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI breakage on non-cache coherent architectures
FYI, Here's what I have for the SCSI change. I haven't updated drivers to care for the new return code though, help appreciated with that as I don't know much about these drivers. Index: linux-work/drivers/scsi/scsi_error.c === --- linux-work.orig/drivers/scsi/scsi_error.c 2007-11-20 13:26:18.0 +1100 +++ linux-work/drivers/scsi/scsi_error.c2007-11-20 13:43:05.0 +1100 @@ -602,8 +602,9 @@ static void scsi_abort_eh_cmnd(struct sc * @cmnd is ignored and this functions sets up a REQUEST_SENSE command * and cmnd buffers to read @sense_bytes into @scmd-sense_buffer. **/ -void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, - unsigned char *cmnd, int cmnd_size, unsigned sense_bytes) +int scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, + unsigned char *cmnd, int cmnd_size, unsigned sense_bytes, + gfp_t gfp_mask) { struct scsi_device *sdev = scmd-device; @@ -622,12 +623,20 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd ses-use_sg = scmd-use_sg; ses-resid = scmd-resid; ses-result = scmd-result; + sg_init_table(ses-sense_sgl, 1); if (sense_bytes) { + struct page *pg; + + if (sdev-host-hostt-unchecked_isa_dma) + gfp_mask |= __GFP_DMA; scmd-request_bufflen = min_t(unsigned, sizeof(scmd-sense_buffer), sense_bytes); - sg_init_one(ses-sense_sgl, scmd-sense_buffer, - scmd-request_bufflen); + pg = alloc_page(gfp_mask); + if (!pg) + return FAILED; + memset(page_address(pg), 0, scmd-request_bufflen); + sg_set_page(ses-sense_sgl, pg, scmd-request_bufflen, 0); scmd-request_buffer = ses-sense_sgl; scmd-sc_data_direction = DMA_FROM_DEVICE; scmd-use_sg = 1; @@ -658,6 +667,8 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd * untransferred sense data should be interpreted as being zero. */ memset(scmd-sense_buffer, 0, sizeof(scmd-sense_buffer)); + + return SUCCESS; } EXPORT_SYMBOL(scsi_eh_prep_cmnd); @@ -670,9 +681,17 @@ EXPORT_SYMBOL(scsi_eh_prep_cmnd); **/ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses) { + struct page *pg; + /* * Restore original data */ + pg = sg_page(ses-sense_sgl); + if (pg) { + memcpy(scmd-sense_buffer, page_address(pg), + scmd-request_bufflen); + __free_page(pg); + } scmd-cmd_len = ses-cmd_len; memcpy(scmd-cmnd, ses-cmnd, sizeof(scmd-cmnd)); scmd-sc_data_direction = ses-data_direction; @@ -709,7 +728,10 @@ static int scsi_send_eh_cmnd(struct scsi struct scsi_eh_save ses; int rtn; - scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes); + if (scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes, + GFP_KERNEL) != SUCCESS) + return FAILED; + shost-eh_action = done; spin_lock_irqsave(shost-host_lock, flags); Index: linux-work/include/scsi/scsi_eh.h === --- linux-work.orig/include/scsi/scsi_eh.h 2007-11-20 13:36:44.0 +1100 +++ linux-work/include/scsi/scsi_eh.h 2007-11-20 13:42:49.0 +1100 @@ -81,9 +81,10 @@ struct scsi_eh_save { struct scatterlist sense_sgl; }; -extern void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, +extern int scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses, unsigned char *cmnd, - int cmnd_size, unsigned sense_bytes); + int cmnd_size, unsigned sense_bytes, + gfp_t gfp_mask); extern void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses); Index: linux-work/drivers/scsi/NCR5380.c === --- linux-work.orig/drivers/scsi/NCR5380.c 2007-11-20 13:46:41.0 +1100 +++ linux-work/drivers/scsi/NCR5380.c 2007-11-20 13:46:47.0 +1100 @@ -2283,7 +2283,7 @@ static void NCR5380_information_transfer } if ((cmd-cmnd[0] != REQUEST_SENSE) (status_byte(cmd-SCp.Status) == CHECK_CONDITION)) { - scsi_eh_prep_cmnd(cmd, hostdata-ses, NULL, 0, ~0); + scsi_eh_prep_cmnd(cmd, hostdata-ses, NULL, 0, ~0, GFP_ATOMIC); dprintk(NDEBUG_AUTOSENSE, (scsi%d : performing request sense\n,
SCSI breakage on non-cache coherent architectures
Hi James ! (Please CC me on replies as I'm not subscribed to linux-scsi) I've been debugging various issues on the PowerPC 44x embedded architecture which happens to have non-coherent PCI DMA. One of the problem I'm hitting is that one really need to enforce kmalloc alignement to cache lines or bad things will happen (among others with USB), for some reasons, powerpc failed to do so, I fixed it. The other one I'm hitting now is that the SCSI layer nowadays embeds the sense_buffer inside the scsi_cmnd structure without any kind of alignment whatsoever. I've been hitting irregulary is a crash on SCSI command completion that seems to be related to corruption of the request pointer in struct scsi_cmnd and I think it might be the cause. I'm now trying to setup a proper repro-case. The sense buffer is something that is written to by the device, thus it gets cache invalidated when the DMA happens, potentially losing whatever was sharing the cache line, which includes, among other things, that request pointer field. There are other issues as well if any of the fields sharing the cache line happens to be read while the DMA is in progress, it would populate the cache with memory content prior to the DMA being completed. It's fairly important on such architectures not to share cache lines between objects being DMA'd to/from and the rest of the system. If the DMA is strictly outgoing, it's generally ok, but not if the DMA is bidirectional or the device is writing to memory. I'm not sure what is the best way to fix that. Internally, I've done some test whacking some cacheline_aligned in the scsi_cmnd data structure to verify I no longer get random SLAB corruption when using my USB but that significantly bloats the size of the structure on archs such as ppc64 that don't need it and have a large cache line size. Unfortunately, I don't think there's any existing Kconfig symbol or arch provided #define to tell us that we are on a non-coherent arch afaik that could be used to make that conditional. Another option would be to kmalloc the buffer (wasn't it the case before btw ?) but I suppose some people will scream at the idea due to how the command pools are done... What do you suggest ? Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html