Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread Thomas Bogendoerfer
On Tue, Nov 20, 2007 at 06:51:14AM +1100, Benjamin Herrenschmidt wrote:
 
 On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote:
  From: Benjamin Herrenschmidt [EMAIL PROTECTED]
  Date: Mon, 19 Nov 2007 16:35:23 +1100
  
   I'm not sure what is the best way to fix that. Internally, I've done
   some test whacking some cacheline_aligned in the scsi_cmnd data
   structure to verify I no longer get random SLAB corruption when using my
   USB but that significantly bloats the size of the structure on archs
   such as ppc64 that don't need it and have a large cache line size.
   
   Unfortunately, I don't think there's any existing Kconfig symbol or arch
   provided #define to tell us that we are on a non-coherent arch afaik
   that could be used to make that conditional.
   
   Another option would be to kmalloc the buffer (wasn't it the case before
   btw ?) but I suppose some people will scream at the idea due to how the
   command pools are done...
  
  You could make a dma_cacheline_aligned and use that.
  It seems pretty reasonable.
 
 I was thinking about that. What archs would need it ? arm, mips, what
 else ?

older parisc

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessary a
good idea.[ RFC1925, 2.3 ]
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread James Bottomley

On Tue, 2007-11-20 at 09:29 +0100, Thomas Bogendoerfer wrote:
 On Tue, Nov 20, 2007 at 06:51:14AM +1100, Benjamin Herrenschmidt wrote:
  
  On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote:
   From: Benjamin Herrenschmidt [EMAIL PROTECTED]
   Date: Mon, 19 Nov 2007 16:35:23 +1100
   
I'm not sure what is the best way to fix that. Internally, I've done
some test whacking some cacheline_aligned in the scsi_cmnd data
structure to verify I no longer get random SLAB corruption when using my
USB but that significantly bloats the size of the structure on archs
such as ppc64 that don't need it and have a large cache line size.

Unfortunately, I don't think there's any existing Kconfig symbol or arch
provided #define to tell us that we are on a non-coherent arch afaik
that could be used to make that conditional.

Another option would be to kmalloc the buffer (wasn't it the case before
btw ?) but I suppose some people will scream at the idea due to how the
command pools are done...
   
   You could make a dma_cacheline_aligned and use that.
   It seems pretty reasonable.
  
  I was thinking about that. What archs would need it ? arm, mips, what
  else ?
 
 older parisc

Actually, we already established on IRC that the lasi700 driver doesn't
need this, principally because the parisc architecture doesn't do an
invalidate for DMA_FROM_DEVICE but a flush and invalidate
(architecturally, if you read our manuals, even pdc is entitled to write
back dirty lines, so it's not clear there's actually an invalidate
instruction we can use).   This is also one possible temporary fix for
the other architectures if we can't get a different method to work
nicely.

James


James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread James Bottomley

On Tue, 2007-11-20 at 14:14 +1100, Benjamin Herrenschmidt wrote:
 FYI, Here's what I have for the SCSI change. I haven't updated drivers
 to care for the new return code though, help appreciated with that as I
 don't know much about these drivers.

It looks to me like the return problem could be solved by passing in the
buffer to be used for sense; that way we can allocate it in hostdata at
init time for all the drivers and scsi_error can allocate and free a
page using GFP_KERNEL.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread Roland Dreier
  Actually, we already established on IRC that the lasi700 driver doesn't
  need this, principally because the parisc architecture doesn't do an
  invalidate for DMA_FROM_DEVICE but a flush and invalidate
  (architecturally, if you read our manuals, even pdc is entitled to write
  back dirty lines, so it's not clear there's actually an invalidate
  instruction we can use).   This is also one possible temporary fix for
  the other architectures if we can't get a different method to work
  nicely.

I think doing a writeback and invalidate is a very fragile way to deal
with DMA into the middle of a data structure.  It may work OK for now,
but you have to make sure forever into the future that no codepath
anywhere else ever touches the cacheline that you're DMAing into while
the DMA is pending.  It just leaves a hidden trap that is too easy to
step on, because the architectures that get pretty much all testing
all have cache-coherent DMA.

Reviving my ancient __dma_buffer patch seems far preferable to me.

 - R.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread James Bottomley

On Tue, 2007-11-20 at 12:05 -0800, Roland Dreier wrote:
  Actually, we already established on IRC that the lasi700 driver doesn't
   need this, principally because the parisc architecture doesn't do an
   invalidate for DMA_FROM_DEVICE but a flush and invalidate
   (architecturally, if you read our manuals, even pdc is entitled to write
   back dirty lines, so it's not clear there's actually an invalidate
   instruction we can use).   This is also one possible temporary fix for
   the other architectures if we can't get a different method to work
   nicely.
 
 I think doing a writeback and invalidate is a very fragile way to deal
 with DMA into the middle of a data structure.  It may work OK for now,
 but you have to make sure forever into the future that no codepath
 anywhere else ever touches the cacheline that you're DMAing into while
 the DMA is pending.  It just leaves a hidden trap that is too easy to
 step on, because the architectures that get pretty much all testing
 all have cache-coherent DMA.
 
 Reviving my ancient __dma_buffer patch seems far preferable to me.

We're talking about trying to fix this for 2.4; which is already at
-rc3 ... Is an entire arch change for dma alignment really a merge
candidate at this stage?

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread Benjamin Herrenschmidt

On Tue, 2007-11-20 at 15:10 -0600, James Bottomley wrote:
 We're talking about trying to fix this for 2.4; which is already at
 -rc3 ... Is an entire arch change for dma alignment really a merge
 candidate at this stage?

Well, as I said before... it's a matter of what seems to be the less
likely to break something right ?

On one side,  I'm doing surgery on code I barely know, the scsi error
handling, and now it seems I also have to fixup a handful of drivers
that aren't the most obvious pieces of code around.

On the other side, Roland proposal is basically just adding a macro that
can be empty for everybody but a handful of archs, and stick it onto one
field in one structure...

The later has about 0 chances to actually break something or cause a
regression. I wouldn't say that about the former.

Now, I will see if I manage to fixup the NCR drivers to pass a
pre-allocated buffer (USB storage I think can pass NULL as it's not
calling prep in atomic context). But then, it complicates the matter
because that means restore will have to know whether prep allocated
the buffer or not, thus more fields to add to the save struct, it's
getting messy, unless we decide -all- callers are responsible for the
buffer allocation (hrm... maybe the best approach).

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-20 Thread James Bottomley

On Wed, 2007-11-21 at 09:39 +1100, Benjamin Herrenschmidt wrote:
 On Tue, 2007-11-20 at 15:10 -0600, James Bottomley wrote:
  We're talking about trying to fix this for 2.4; which is already at
  -rc3 ... Is an entire arch change for dma alignment really a merge
  candidate at this stage?
 
 Well, as I said before... it's a matter of what seems to be the less
 likely to break something right ?
 
 On one side,  I'm doing surgery on code I barely know, the scsi error
 handling, and now it seems I also have to fixup a handful of drivers
 that aren't the most obvious pieces of code around.
 
 On the other side, Roland proposal is basically just adding a macro that
 can be empty for everybody but a handful of archs, and stick it onto one
 field in one structure...

Yes ... it's the getting arch owner agreement to send the patch that
slightly worries me.

 The later has about 0 chances to actually break something or cause a
 regression. I wouldn't say that about the former.
 
 Now, I will see if I manage to fixup the NCR drivers to pass a
 pre-allocated buffer (USB storage I think can pass NULL as it's not
 calling prep in atomic context). But then, it complicates the matter
 because that means restore will have to know whether prep allocated
 the buffer or not, thus more fields to add to the save struct, it's
 getting messy, unless we decide -all- callers are responsible for the
 buffer allocation (hrm... maybe the best approach).

Sorry, yes, that's what I was thinking ... identically to the way the
struct scsi_eh_save is handled ... or indeed as an extra pointer field
inside scsi_eh_save.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread David Miller
From: Benjamin Herrenschmidt [EMAIL PROTECTED]
Date: Mon, 19 Nov 2007 16:35:23 +1100

 I'm not sure what is the best way to fix that. Internally, I've done
 some test whacking some cacheline_aligned in the scsi_cmnd data
 structure to verify I no longer get random SLAB corruption when using my
 USB but that significantly bloats the size of the structure on archs
 such as ppc64 that don't need it and have a large cache line size.
 
 Unfortunately, I don't think there's any existing Kconfig symbol or arch
 provided #define to tell us that we are on a non-coherent arch afaik
 that could be used to make that conditional.
 
 Another option would be to kmalloc the buffer (wasn't it the case before
 btw ?) but I suppose some people will scream at the idea due to how the
 command pools are done...

You could make a dma_cacheline_aligned and use that.
It seems pretty reasonable.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Matthew Wilcox
On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote:
 The other one I'm hitting now is that the SCSI layer nowadays embeds the

'nowadays'?  It has always been so.

 sense_buffer inside the scsi_cmnd structure without any kind of
 alignment whatsoever. I've been hitting irregulary is a crash on SCSI command
 completion that seems to be related to corruption of the request
 pointer in struct scsi_cmnd and I think it might be the cause.
 I'm now trying to setup a proper repro-case.

What other drivers do is DMA to their own allocation and then memcpy to
the sense buffer.

There is a movement to allocate the sense data as its own sg list, but
I don't think that patch has even been posted yet.

-- 
Intel are signing my paycheques ... these opinions are still mine
Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread James Bottomley

On Mon, 2007-11-19 at 05:32 -0700, Matthew Wilcox wrote:
 On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote:
  The other one I'm hitting now is that the SCSI layer nowadays embeds the
 
 'nowadays'?  It has always been so.
 
  sense_buffer inside the scsi_cmnd structure without any kind of
  alignment whatsoever. I've been hitting irregulary is a crash on SCSI 
  command
  completion that seems to be related to corruption of the request
  pointer in struct scsi_cmnd and I think it might be the cause.
  I'm now trying to setup a proper repro-case.
 
 What other drivers do is DMA to their own allocation and then memcpy to
 the sense buffer.
 
 There is a movement to allocate the sense data as its own sg list, but
 I don't think that patch has even been posted yet.

I'd like to be rid of it inside the command for various reasons:  every
command has one of these, and they're expensive in the allocation (at 96
bytes).  There's no reason we have to allocate and free that amount of
space with every command.  In theory, the number of these is bounded at
the queue depth, in practice, there's usually only one, and this DMA
alignment issue does requires most drivers to double copy.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Mon, 19 Nov 2007 16:35:23 +1100
 
  I'm not sure what is the best way to fix that. Internally, I've done
  some test whacking some cacheline_aligned in the scsi_cmnd data
  structure to verify I no longer get random SLAB corruption when using my
  USB but that significantly bloats the size of the structure on archs
  such as ppc64 that don't need it and have a large cache line size.
  
  Unfortunately, I don't think there's any existing Kconfig symbol or arch
  provided #define to tell us that we are on a non-coherent arch afaik
  that could be used to make that conditional.
  
  Another option would be to kmalloc the buffer (wasn't it the case before
  btw ?) but I suppose some people will scream at the idea due to how the
  command pools are done...
 
 You could make a dma_cacheline_aligned and use that.
 It seems pretty reasonable.

I was thinking about that. What archs would need it ? arm, mips, what
else ?

Cheers,
Ben.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

On Mon, 2007-11-19 at 05:32 -0700, Matthew Wilcox wrote:
 On Mon, Nov 19, 2007 at 04:35:23PM +1100, Benjamin Herrenschmidt wrote:
  The other one I'm hitting now is that the SCSI layer nowadays embeds the
 
 'nowadays'?  It has always been so.

Wasn't it kmalloc'ed at one point ?

  sense_buffer inside the scsi_cmnd structure without any kind of
  alignment whatsoever. I've been hitting irregulary is a crash on SCSI 
  command
  completion that seems to be related to corruption of the request
  pointer in struct scsi_cmnd and I think it might be the cause.
  I'm now trying to setup a proper repro-case.
 
 What other drivers do is DMA to their own allocation and then memcpy to
 the sense buffer.

What other drivers ? Those architectures use the same drivers as
everything else.

 There is a movement to allocate the sense data as its own sg list, but
 I don't think that patch has even been posted yet.

I've seen code creating an sglist from the scsi_cmnd-sense_buffer and
passing that to drivers. That breaks.

Ben.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

On Mon, 2007-11-19 at 09:09 -0600, James Bottomley wrote:
  What other drivers do is DMA to their own allocation and then memcpy to
  the sense buffer.
  
  There is a movement to allocate the sense data as its own sg list, but
  I don't think that patch has even been posted yet.
 
 I'd like to be rid of it inside the command for various reasons:  every
 command has one of these, and they're expensive in the allocation (at 96
 bytes).  There's no reason we have to allocate and free that amount of
 space with every command.  In theory, the number of these is bounded at
 the queue depth, in practice, there's usually only one, and this DMA
 alignment issue does requires most drivers to double copy.

And most drivers don't and break. Take USB storage, I -think- (code path
aren't trivial to follow) it just gets the sglist cooked up by the code
in scsi_error.c no ? That just points to the buffer in scsi_cmnd. It
then pass that for DMA to the USB stack.

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

 I'd like to be rid of it inside the command for various reasons:  every
 command has one of these, and they're expensive in the allocation (at 96
 bytes).  There's no reason we have to allocate and free that amount of
 space with every command.  In theory, the number of these is bounded at
 the queue depth, in practice, there's usually only one, and this DMA
 alignment issue does requires most drivers to double copy.

Do you have a plan for short term here ? I'd like something fixed
for .25, so I may have to introduce a __dma_aligned macro of some sort
to deal with that in the meantime...

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Roland Dreier
  I've been debugging various issues on the PowerPC 44x embedded
  architecture which happens to have non-coherent PCI DMA.
  
  One of the problem I'm hitting is that one really need to enforce
  kmalloc alignement to cache lines or bad things will happen (among
  others with USB), for some reasons, powerpc failed to do so, I fixed it.

Heh... I hit the same problem literally 5 years ago:
http://lwn.net/Articles/1783/

I implemented the __dma_buffer annotation:
http://lwn.net/Articles/2269/

But DaveM said we should just use the PCI pool code instead:
http://lwn.net/Articles/2270/

 - R.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

On Mon, 2007-11-19 at 13:43 -0800, Roland Dreier wrote:
  I've been debugging various issues on the PowerPC 44x embedded
   architecture which happens to have non-coherent PCI DMA.
   
   One of the problem I'm hitting is that one really need to enforce
   kmalloc alignement to cache lines or bad things will happen (among
   others with USB), for some reasons, powerpc failed to do so, I fixed it.
 
 Heh... I hit the same problem literally 5 years ago:
 http://lwn.net/Articles/1783/
 
 I implemented the __dma_buffer annotation:
 http://lwn.net/Articles/2269/
 
 But DaveM said we should just use the PCI pool code instead:
 http://lwn.net/Articles/2270/

Heh, well... 

In this case, DaveM just proposed something akin to your
__dma_buffer :-)

On the other hand, after discussing with James, it looks like we'll just
be reverting the patch that removed the kmalloc of the sense buffer
since non cache-coherent archs are supposed to enforce kmalloc alignment
to cache lines.

__dma_buffer still seems like a good thing to have if too many other
drivers are hitting that but for this specific problem, it's not the
approach that we'll be taking.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread David Miller
From: Benjamin Herrenschmidt [EMAIL PROTECTED]
Date: Tue, 20 Nov 2007 06:51:14 +1100

 On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote:
  From: Benjamin Herrenschmidt [EMAIL PROTECTED]
  Date: Mon, 19 Nov 2007 16:35:23 +1100
  
  You could make a dma_cacheline_aligned and use that.
  It seems pretty reasonable.
 
 I was thinking about that. What archs would need it ? arm, mips, what
 else ?

The sparc32 port would need it too.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

On Mon, 2007-11-19 at 14:31 -0800, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Tue, 20 Nov 2007 06:51:14 +1100
 
  On Mon, 2007-11-19 at 00:38 -0800, David Miller wrote:
   From: Benjamin Herrenschmidt [EMAIL PROTECTED]
   Date: Mon, 19 Nov 2007 16:35:23 +1100
   
   You could make a dma_cacheline_aligned and use that.
   It seems pretty reasonable.
  
  I was thinking about that. What archs would need it ? arm, mips, what
  else ?
 
 The sparc32 port would need it too.

James preference seem to go for a revert of the patch that removed the
kmalloc of the buffer instead. Sounds definitely like an easier plan
for .24 (and maybe even backport to stable).

I'll produce a patch for that later today or tomorrow.

Do you still think we should introduce this __dma_cacheline_aligned ? Do
you see other cases of drivers where it would be useful ? It tend to
agree with your earlier statement that drivers doing that are broken and
should be using a separate allocator for DMA'ble objects (in fact, on
non-cache coherent archs, kmalloc is just fine).

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

On Mon, 2007-11-19 at 16:46 -0800, David Miller wrote:
 
 1) Require that entire buffers are commited by call sites,
and thus embedding DMA'd within non-DMA stuff isn't allowed
 
 2) Add the __dma_cacheline_aligned tag.
 
 But note that with #2 it could get quite ugly because the
 alignment and size both have a minimum that needs to be
 enforced, not just the alignment alone.  So either:

Yup.

 struct foo {
 unsigned int other_unrelated_stuff;
 
 struct object dma_thing __dma_cacheline_aligned;
 
 unsigned int more_nondma_stuff __dma_cacheline_aligned;
 };

In my tests, I had used a fuckton_t object defined to be an empty
thing with alignment constraint, seemed to work :-) But I'd rather
require #1.

BTW. What is the status nowadays with skb's ?

Cheers,
Ben.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread David Miller
From: Benjamin Herrenschmidt [EMAIL PROTECTED]
Date: Tue, 20 Nov 2007 11:55:01 +1100

 BTW. What is the status nowadays with skb's ?

Good question.

Some drivers are problematic (or were) because they put
DMA descriptor chaining information at the head of the
buffer, but those have been fixed either to use alternate
descriptor ring facilities or make the proper DMA sync
calls.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Roland Dreier
  2) Add the __dma_cacheline_aligned tag.
  
  But note that with #2 it could get quite ugly because the
  alignment and size both have a minimum that needs to be
  enforced, not just the alignment alone.  So either:
  
  struct foo {
   unsigned int other_unrelated_stuff;
  
   struct object dma_thing __dma_cacheline_aligned;
  
   unsigned int more_nondma_stuff __dma_cacheline_aligned;
  };
  
  or:
  
  struct foo {
   unsigned int other_unrelated_stuff;
  
   union {
   struct object dma_thing __dma_cacheline_aligned;
   char __pad[(sizeof(object) + DMA_CACHELINE_SIZE 
  ~DMA_CACHELINE_SIZE)];
   } u;
  
   unsigned int more_nondma_stuff __dma_cacheline_aligned;
  };

I wrapped this ugliness up inside the macro back in what I posted in
2002 (http://lkml.org/lkml/2002/6/12/234):

#define __dma_buffer __dma_buffer_line(__LINE__)
#define __dma_buffer_line(line) __dma_buffer_expand_line(line)
#define __dma_buffer_expand_line(line) \
__attribute__ ((aligned(L1_CACHE_BYTES))); \
char __dma_pad_ ## line [0] __attribute__ ((aligned(L1_CACHE_BYTES)))

then you just need to tag the actual member like:

char foo[3] __dma_buffer;

 - R.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt

On Mon, 2007-11-19 at 18:10 -0800, Roland Dreier wrote:
 
 I wrapped this ugliness up inside the macro back in what I posted in
 2002 (http://lkml.org/lkml/2002/6/12/234):
 
 #define __dma_buffer __dma_buffer_line(__LINE__)
 #define __dma_buffer_line(line) __dma_buffer_expand_line(line)
 #define __dma_buffer_expand_line(line) \
 __attribute__ ((aligned(L1_CACHE_BYTES))); \
 char __dma_pad_ ## line [0] __attribute__
 ((aligned(L1_CACHE_BYTES)))
 
 then you just need to tag the actual member like:
 
 char foo[3] __dma_buffer;

That's actually not too bad ... 

I'm having a problem with reverting SCSI to use an allocation for the
sense buffer, because it can fail and the new scso_eh_prep_cmnd() isn't
supposed to return a failure. I've changed that but now I get into
trying to fix the various drivers that use it without checking the
result code and it's becoming much more complicated than initially
thought.

So I may do the above instead and revive your patch.

Any objection ? James ? David ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SCSI breakage on non-cache coherent architectures

2007-11-19 Thread Benjamin Herrenschmidt
FYI, Here's what I have for the SCSI change. I haven't updated drivers
to care for the new return code though, help appreciated with that as I
don't know much about these drivers.

Index: linux-work/drivers/scsi/scsi_error.c
===
--- linux-work.orig/drivers/scsi/scsi_error.c   2007-11-20 13:26:18.0 
+1100
+++ linux-work/drivers/scsi/scsi_error.c2007-11-20 13:43:05.0 
+1100
@@ -602,8 +602,9 @@ static void scsi_abort_eh_cmnd(struct sc
  * @cmnd is ignored and this functions sets up a REQUEST_SENSE command
  * and cmnd buffers to read @sense_bytes into @scmd-sense_buffer.
  **/
-void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
-   unsigned char *cmnd, int cmnd_size, unsigned 
sense_bytes)
+int scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
+ unsigned char *cmnd, int cmnd_size, unsigned sense_bytes,
+ gfp_t gfp_mask)
 {
struct scsi_device *sdev = scmd-device;
 
@@ -622,12 +623,20 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
ses-use_sg = scmd-use_sg;
ses-resid = scmd-resid;
ses-result = scmd-result;
+   sg_init_table(ses-sense_sgl, 1);
 
if (sense_bytes) {
+   struct page *pg;
+
+   if (sdev-host-hostt-unchecked_isa_dma)
+   gfp_mask |= __GFP_DMA;
scmd-request_bufflen = min_t(unsigned,
   sizeof(scmd-sense_buffer), sense_bytes);
-   sg_init_one(ses-sense_sgl, scmd-sense_buffer,
-  scmd-request_bufflen);
+   pg = alloc_page(gfp_mask);
+   if (!pg)
+   return FAILED;
+   memset(page_address(pg), 0, scmd-request_bufflen);
+   sg_set_page(ses-sense_sgl, pg, scmd-request_bufflen, 0);
scmd-request_buffer = ses-sense_sgl;
scmd-sc_data_direction = DMA_FROM_DEVICE;
scmd-use_sg = 1;
@@ -658,6 +667,8 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd 
 * untransferred sense data should be interpreted as being zero.
 */
memset(scmd-sense_buffer, 0, sizeof(scmd-sense_buffer));
+
+   return SUCCESS;
 }
 EXPORT_SYMBOL(scsi_eh_prep_cmnd);
 
@@ -670,9 +681,17 @@ EXPORT_SYMBOL(scsi_eh_prep_cmnd);
  **/
 void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses)
 {
+   struct page *pg;
+
/*
 * Restore original data
 */
+   pg = sg_page(ses-sense_sgl);
+   if (pg) {
+   memcpy(scmd-sense_buffer, page_address(pg),
+  scmd-request_bufflen);
+   __free_page(pg);
+   }
scmd-cmd_len = ses-cmd_len;
memcpy(scmd-cmnd, ses-cmnd, sizeof(scmd-cmnd));
scmd-sc_data_direction = ses-data_direction;
@@ -709,7 +728,10 @@ static int scsi_send_eh_cmnd(struct scsi
struct scsi_eh_save ses;
int rtn;
 
-   scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes);
+   if (scsi_eh_prep_cmnd(scmd, ses, cmnd, cmnd_size, sense_bytes,
+ GFP_KERNEL) != SUCCESS)
+   return FAILED;
+
shost-eh_action = done;
 
spin_lock_irqsave(shost-host_lock, flags);
Index: linux-work/include/scsi/scsi_eh.h
===
--- linux-work.orig/include/scsi/scsi_eh.h  2007-11-20 13:36:44.0 
+1100
+++ linux-work/include/scsi/scsi_eh.h   2007-11-20 13:42:49.0 +1100
@@ -81,9 +81,10 @@ struct scsi_eh_save {
struct scatterlist sense_sgl;
 };
 
-extern void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd,
+extern int scsi_eh_prep_cmnd(struct scsi_cmnd *scmd,
struct scsi_eh_save *ses, unsigned char *cmnd,
-   int cmnd_size, unsigned sense_bytes);
+   int cmnd_size, unsigned sense_bytes,
+   gfp_t gfp_mask);
 
 extern void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd,
struct scsi_eh_save *ses);
Index: linux-work/drivers/scsi/NCR5380.c
===
--- linux-work.orig/drivers/scsi/NCR5380.c  2007-11-20 13:46:41.0 
+1100
+++ linux-work/drivers/scsi/NCR5380.c   2007-11-20 13:46:47.0 +1100
@@ -2283,7 +2283,7 @@ static void NCR5380_information_transfer
}
 
if ((cmd-cmnd[0] != REQUEST_SENSE)  
(status_byte(cmd-SCp.Status) == CHECK_CONDITION)) {
-   scsi_eh_prep_cmnd(cmd, 
hostdata-ses, NULL, 0, ~0);
+   scsi_eh_prep_cmnd(cmd, 
hostdata-ses, NULL, 0, ~0, GFP_ATOMIC);
 
dprintk(NDEBUG_AUTOSENSE, 
(scsi%d : performing request sense\n, 

SCSI breakage on non-cache coherent architectures

2007-11-18 Thread Benjamin Herrenschmidt
Hi James !

(Please CC me on replies as I'm not subscribed to linux-scsi)

I've been debugging various issues on the PowerPC 44x embedded
architecture which happens to have non-coherent PCI DMA.

One of the problem I'm hitting is that one really need to enforce
kmalloc alignement to cache lines or bad things will happen (among
others with USB), for some reasons, powerpc failed to do so, I fixed it.

The other one I'm hitting now is that the SCSI layer nowadays embeds the
sense_buffer inside the scsi_cmnd structure without any kind of
alignment whatsoever. I've been hitting irregulary is a crash on SCSI command
completion that seems to be related to corruption of the request
pointer in struct scsi_cmnd and I think it might be the cause.
I'm now trying to setup a proper repro-case.

The sense buffer is something that is written to by the device, thus it
gets cache invalidated when the DMA happens, potentially losing whatever
was sharing the cache line, which includes, among other things, that
request pointer field.

There are other issues as well if any of the fields sharing the cache
line happens to be read while the DMA is in progress, it would
populate the cache with memory content prior to the DMA being completed.

It's fairly important on such architectures not to share cache lines
between objects being DMA'd to/from and the rest of the system. If the
DMA is strictly outgoing, it's generally ok, but not if the DMA is
bidirectional or the device is writing to memory.

I'm not sure what is the best way to fix that. Internally, I've done
some test whacking some cacheline_aligned in the scsi_cmnd data
structure to verify I no longer get random SLAB corruption when using my
USB but that significantly bloats the size of the structure on archs
such as ppc64 that don't need it and have a large cache line size.

Unfortunately, I don't think there's any existing Kconfig symbol or arch
provided #define to tell us that we are on a non-coherent arch afaik
that could be used to make that conditional.

Another option would be to kmalloc the buffer (wasn't it the case before
btw ?) but I suppose some people will scream at the idea due to how the
command pools are done...

What do you suggest ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html