Re: [PATCH v2] add bidi support for block pc requests

2007-07-07 Thread Jeff Garzik

James Bottomley wrote:

On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote:

Jeff Garzik wrote:

Boaz Harrosh wrote:

FUJITA Tomonori wrote:

From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 17 May 2007 17:00:21 +0300


Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
fixed it. Now it works with 127.

I think that we can just remove blk_queue_max_phys_segments since the
ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.


Who should send a patch upstream? (I cc'ed Jeff Garzik)

Boaz

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index dd81fa7..3660f3e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)

ata_scsi_sdev_config(sdev);

-   blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD);
-
sdev-manage_start_stop = 1;

I don't mind the patch, but could someone refresh me as to the context?

Is there something wrong with the above code, or is it simply redundant 
to the scsi_host_template settings in each LLDD?


Jeff




Hi Jeff
What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD (=128)
than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But 
what
happens if SCSI-ml sets a lower value? It will than crash on unexpected high sg
count. My first Patch was an if  but Tomo said that it is redundant since
drivers do that already. So I guess it is your call. Can it be removed or we 
need
a: if ( sdev-request_queue-max_phys_segments  LIBATA_MAX_PRD)


Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to
set because what a driver wants to see is what comes out the other side
of dma_map_sg() (which is controlled by hw_segments) not what goes in.
However, I could see libata having an interest in the phys_segments if
the physical region descriptors are going to the device via PIO ... is
that what this is for?


LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements 
permitted by the HBA's DMA engine, for a single ATA command.


The PIO code doesn't really care about segments.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-07-07 Thread James Bottomley
On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote:
 James Bottomley wrote:
  On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote:
  Jeff Garzik wrote:
  Boaz Harrosh wrote:
  FUJITA Tomonori wrote:
  From: Boaz Harrosh [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Thu, 17 May 2007 17:00:21 +0300
 
  Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
  fixed it. Now it works with 127.
  I think that we can just remove blk_queue_max_phys_segments since the
  ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.
 
  Who should send a patch upstream? (I cc'ed Jeff Garzik)
 
  Boaz
 
  diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
  index dd81fa7..3660f3e 100644
  --- a/drivers/ata/libata-scsi.c
  +++ b/drivers/ata/libata-scsi.c
  @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
 
   ata_scsi_sdev_config(sdev);
 
  -blk_queue_max_phys_segments(sdev-request_queue, 
  LIBATA_MAX_PRD);
  -
   sdev-manage_start_stop = 1;
  I don't mind the patch, but could someone refresh me as to the context?
 
  Is there something wrong with the above code, or is it simply redundant 
  to the scsi_host_template settings in each LLDD?
 
Jeff
 
 
 
  Hi Jeff
  What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD 
  (=128)
  than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. 
  But what
  happens if SCSI-ml sets a lower value? It will than crash on unexpected 
  high sg
  count. My first Patch was an if  but Tomo said that it is redundant 
  since
  drivers do that already. So I guess it is your call. Can it be removed or 
  we need
  a: if ( sdev-request_queue-max_phys_segments  LIBATA_MAX_PRD)
  
  Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to
  set because what a driver wants to see is what comes out the other side
  of dma_map_sg() (which is controlled by hw_segments) not what goes in.
  However, I could see libata having an interest in the phys_segments if
  the physical region descriptors are going to the device via PIO ... is
  that what this is for?
 
 LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements 
 permitted by the HBA's DMA engine, for a single ATA command.

Then it's the wrong parameter you're setting:  phys_segments is what you
have going into a dma_map_sg() but hw_segments is what you end up after
it (mathemtaically, the mapping never splits segments, so hw_segments =
phys_segments always, but it's still the wrong way to bound this); so if
you want to limit the SG elements in the HBA, you should set hw_segments
not phys_segments (which is what the sg_tablesize parameter of the host
structure corresponds to).

However, I suspect this has to do with our iommu problem, doesn't it?
The IOMMU may merge the segments beyond your 64k DMA boundary, so you
need to split them again ... in that case you need phys_segments
bounded, not hw_segments?  I really wish we could get this iommu
parameter problem fixed so we didn't have to have all of this confusing
code.

 The PIO code doesn't really care about segments.

OK ... I just thought it was PIO because the only time you should worry
about phys_segments is if you're not going to do a dma_map_sg() on the
scatterlist.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-07-07 Thread Jeff Garzik

James Bottomley wrote:

On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote:
LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements 
permitted by the HBA's DMA engine, for a single ATA command.



Then it's the wrong parameter you're setting:  phys_segments is what you
have going into a dma_map_sg() but hw_segments is what you end up after
it (mathemtaically, the mapping never splits segments, so hw_segments =
phys_segments always, but it's still the wrong way to bound this); so if
you want to limit the SG elements in the HBA, you should set hw_segments
not phys_segments (which is what the sg_tablesize parameter of the host
structure corresponds to).


Honestly I'm betting the code is a result of paranoia^H^H^Hconservatism 
on my part:  setting every DMA and segment limit I can find, in the 
hopes that somehow, somewhere, that information will get filtered down 
to the IOMMU and whatnot.  :)


Given what you say above, and the fact that I set sg_tablesize, 
presumably Boaz's patch to remove phys_segment limiting in libata-scsi 
is the right thing to do.  Your knowledge in this area is most likely 
stronger than mine.




However, I suspect this has to do with our iommu problem, doesn't it?
The IOMMU may merge the segments beyond your 64k DMA boundary, so you
need to split them again ... in that case you need phys_segments
bounded, not hw_segments?


This is why I set every segment limitation I could find :)  And then Ben 
tells me IOMMU may ignore all of that anyway, and so I have code to 
split inside libata as well :)


I just know what IDE needs:  S/G ent shouldn't cross 64k boundary, nor 
exceed 64k in size.  The maximum number of S/G ents is actually a guess. 
 Hard data is tough to come by.  Some DMA engines will cycle through 
all of memory until they hit a stop bit.  Others have limits yet to be 
discovered.  :)


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-06-03 Thread Boaz Harrosh
Jeff Garzik wrote:
 Boaz Harrosh wrote:
 FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 17 May 2007 17:00:21 +0300

 Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
 fixed it. Now it works with 127.
 I think that we can just remove blk_queue_max_phys_segments since the
 ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.

 Who should send a patch upstream? (I cc'ed Jeff Garzik)

 Boaz

 diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
 index dd81fa7..3660f3e 100644
 --- a/drivers/ata/libata-scsi.c
 +++ b/drivers/ata/libata-scsi.c
 @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)

  ata_scsi_sdev_config(sdev);

 -blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD);
 -
  sdev-manage_start_stop = 1;
 
 I don't mind the patch, but could someone refresh me as to the context?
 
 Is there something wrong with the above code, or is it simply redundant 
 to the scsi_host_template settings in each LLDD?
 
   Jeff
 
 
 

Hi Jeff
What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD (=128)
than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But 
what
happens if SCSI-ml sets a lower value? It will than crash on unexpected high sg
count. My first Patch was an if  but Tomo said that it is redundant since
drivers do that already. So I guess it is your call. Can it be removed or we 
need
a: if ( sdev-request_queue-max_phys_segments  LIBATA_MAX_PRD)

Boaz



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-06-03 Thread James Bottomley
On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote:
 Jeff Garzik wrote:
  Boaz Harrosh wrote:
  FUJITA Tomonori wrote:
  From: Boaz Harrosh [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Thu, 17 May 2007 17:00:21 +0300
 
  Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
  fixed it. Now it works with 127.
  I think that we can just remove blk_queue_max_phys_segments since the
  ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.
 
  Who should send a patch upstream? (I cc'ed Jeff Garzik)
 
  Boaz
 
  diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
  index dd81fa7..3660f3e 100644
  --- a/drivers/ata/libata-scsi.c
  +++ b/drivers/ata/libata-scsi.c
  @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
 
 ata_scsi_sdev_config(sdev);
 
  -  blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD);
  -
 sdev-manage_start_stop = 1;
  
  I don't mind the patch, but could someone refresh me as to the context?
  
  Is there something wrong with the above code, or is it simply redundant 
  to the scsi_host_template settings in each LLDD?
  
  Jeff
  
  
  
 
 Hi Jeff
 What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD 
 (=128)
 than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But 
 what
 happens if SCSI-ml sets a lower value? It will than crash on unexpected high 
 sg
 count. My first Patch was an if  but Tomo said that it is redundant since
 drivers do that already. So I guess it is your call. Can it be removed or we 
 need
 a: if ( sdev-request_queue-max_phys_segments  LIBATA_MAX_PRD)

Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to
set because what a driver wants to see is what comes out the other side
of dma_map_sg() (which is controlled by hw_segments) not what goes in.
However, I could see libata having an interest in the phys_segments if
the physical region descriptors are going to the device via PIO ... is
that what this is for?

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-06-01 Thread Jeff Garzik

Boaz Harrosh wrote:

FUJITA Tomonori wrote:

From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 17 May 2007 17:00:21 +0300


Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
fixed it. Now it works with 127.

I think that we can just remove blk_queue_max_phys_segments since the
ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.



Who should send a patch upstream? (I cc'ed Jeff Garzik)

Boaz

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index dd81fa7..3660f3e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)

ata_scsi_sdev_config(sdev);

-   blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD);
-
sdev-manage_start_stop = 1;


I don't mind the patch, but could someone refresh me as to the context?

Is there something wrong with the above code, or is it simply redundant 
to the scsi_host_template settings in each LLDD?


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-24 Thread Boaz Harrosh
FUJITA Tomonori wrote:
 FUJITA Tomonori wrote:
 
 One thing that I found is:
 
 +#define scsi_resid(cmd) ((cmd)-sg_table-resid)
 
 
 This doesn't work for some drivers (at least ipr) since they set
 cmd-resid even with commands without data transfer.
 

James, Tomo.

the last accessor:
+#define scsi_resid(cmd) ((cmd)-resid)

used as an l-value in drivers does not serve our purpose, as seen by the test
implementation of scsi_sg_table. Now clearly this needs an accessor and it is a
bidi parameter (need 2 of them).

I would suggest doing a set_ inline accessor and using that in drivers:
+static inline void scsi_set_resid(struct scsi_cmnd *cmd, resid)
+{
+   cmd-resid = resid;
+}

I would even suggest to make all accessors inlines and not macros just to make 
sure
they are not used as l-value and modified. Though I have not seen such use in
Tomo's patchset.

example:
+static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd)
+{
+   return cmd-request_bufflen;
+}

Boaz
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-24 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 24 May 2007 19:37:06 +0300

 FUJITA Tomonori wrote:
  FUJITA Tomonori wrote:
  
  One thing that I found is:
  
  +#define scsi_resid(cmd) ((cmd)-sg_table-resid)
  
  
  This doesn't work for some drivers (at least ipr) since they set
  cmd-resid even with commands without data transfer.
  
 
 James, Tomo.
 
 the last accessor:
 +#define scsi_resid(cmd) ((cmd)-resid)
 
 used as an l-value in drivers does not serve our purpose, as seen by the test
 implementation of scsi_sg_table. Now clearly this needs an accessor and it is 
 a
 bidi parameter (need 2 of them).

I thought that it would be better to fix several drivers (less than 10).
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-24 Thread Boaz Harrosh
FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 24 May 2007 19:37:06 +0300
 
 FUJITA Tomonori wrote:
 FUJITA Tomonori wrote:
 One thing that I found is:

 +#define scsi_resid(cmd) ((cmd)-sg_table-resid)


 This doesn't work for some drivers (at least ipr) since they set
 cmd-resid even with commands without data transfer.

 James, Tomo.

 the last accessor:
 +#define scsi_resid(cmd) ((cmd)-resid)

 used as an l-value in drivers does not serve our purpose, as seen by the test
 implementation of scsi_sg_table. Now clearly this needs an accessor and it 
 is a
 bidi parameter (need 2 of them).
 
 I thought that it would be better to fix several drivers (less than 10).

I prefer inlines.

One - Programmer cannot make mistakes. Why give him the freedom to something he
must not do?

two - if all/most drivers are doing:
if (scsi_sgl(cmd))
scsi_resid(cmd) = 0;

Than will it not be better to do the if() inside the API?

Boaz

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread FUJITA Tomonori
From: Jens Axboe [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 17 May 2007 07:48:13 +0200

 On Thu, May 17 2007, FUJITA Tomonori wrote:
  From: Jens Axboe [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Wed, 16 May 2007 19:53:22 +0200
  
   On Wed, May 16 2007, Boaz Harrosh wrote:
Boaz Harrosh wrote:
 James Bottomley wrote:
 
 There's actually a fourth option you haven't considered:

 Roll all the required sglist definitions (request_bufflen,
 request_buffer, use_sg and sglist_len) into the sgtable pools.

 This is a grate Idea. Let me see if I understand what you mean.
 ...
 ...

Hi Dear James, list.
I have worked on proposed solution (If anyone is interested see
url blow)

Now it works and all but I hit some problems.
What happens is that in 64 bit architectures, well in x86_64,
the sizeof scatterlist structure is 32 bytes which means that
we can only fit exactly 128 of them in a page. But together with
the scsi_sg_table header we are down to 127. Also if we want
good alignment/packing of other sized pools we want:
static struct scsi_host_sg_pool scsi_sg_pools[] = {
SP(7),
SP(15),
SP(31),
SP(63),
SP(127)
};

now there are 2 issues with this:
1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
   requests for use_sg=128 which will crash the kernel.
   
   That sounds like a serious issue, it should definitely not happen. Stuff
   like that would bite other drivers as well, are you absolutely sure that
   is happening? Power-of-2 bug in your code, or in the SCSI code?
  
  Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?
  
  
If anyone wants to see I have done 2 versions of this work. One on top
of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup
git. both can be found here:
  http://www.bhalevy.com/open-osd/download/scsi_sg_table/
   
   +#define scsi_for_each_sg(cmd, sg, nseg, __i)   \
   +   for (__i = 0, sg = scsi_sglist(cmd); __i  (nseg); __i++, (sg)++)
   
   Hmm?
  
  When for_each_sg is ready, we convert scsi_for_each_sg to use
  for_each_sg. I finished the cleanups on more than 40 drivers on the
  top of your patches.
 
 But this is from the patch that is based on my sglist branch, so it
 looked somewhat odd.

Oops, I didn't see the patch based on the sglist branch. Yeah, it's
odd though it isn't used, I guess.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread Boaz Harrosh
FUJITA Tomonori wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 17 May 2007 07:48:13 +0200
 
 On Thu, May 17 2007, FUJITA Tomonori wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Wed, 16 May 2007 19:53:22 +0200

 On Wed, May 16 2007, Boaz Harrosh wrote:
 now there are 2 issues with this:
 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
requests for use_sg=128 which will crash the kernel.
 That sounds like a serious issue, it should definitely not happen. Stuff
 like that would bite other drivers as well, are you absolutely sure that
 is happening? Power-of-2 bug in your code, or in the SCSI code?
 Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?

These are regular fs (ext3) requests during bootup. The machine will not
boot. (Usually from the read ahead code)
Don't believe me look at the second patch Over Tomo's cleanup.
If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
did in code:
blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
I suppose someone is looking at a different definition. Or there is
another call I need to do for this to work.



 If anyone wants to see I have done 2 versions of this work. One on top
 of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup
 git. both can be found here:
   http://www.bhalevy.com/open-osd/download/scsi_sg_table/
 +#define scsi_for_each_sg(cmd, sg, nseg, __i)   \
 +   for (__i = 0, sg = scsi_sglist(cmd); __i  (nseg); __i++, (sg)++)

Yes leftovers from Tomos branch. Sorry for the noise. This is in no way
finished work Just experimenting. Any way in this git tree no one is
using this macro. (git-pull of Linus tree, apply ver5, and this patch)
This branch will work fine because I have not touched the numbers yet
Just screwed allocations.
 Hmm?
 When for_each_sg is ready, we convert scsi_for_each_sg to use
 for_each_sg. I finished the cleanups on more than 40 drivers on the
 top of your patches.
 But this is from the patch that is based on my sglist branch, so it
 looked somewhat odd.
 
 Oops, I didn't see the patch based on the sglist branch. Yeah, it's
 odd though it isn't used, I guess.
 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread Boaz Harrosh
Jens Axboe wrote:
 On Wed, May 16 2007, James Bottomley wrote:
 On Wed, 2007-05-16 at 19:53 +0200, Jens Axboe wrote:
 The 1-page thing isn't a restriction as such, it's just an optimization.
 The scatterlist allocated is purely a kernel entity, so you could do 4
 contig pages and larger ios that way, if higher order allocations were
 reliable.

 But you are right in that we need to tweak the sg pool size so that it
 ends up being a nice size, and not something that either spans a little
 bit into a second page or doesn't fill a page nicely. On my x86-64 here,
 a 128 segment sg table is exactly one page (looking at slabinfo). It
 depends on the allocator whether that is just right, or just a little
 too much due to management information.
 Actually, if you look at the slab allocation algorithm (particularly
 calculate_slab_order()) you'll find it's not as simplistic as you're
 assuming ... what it actually does is try to allocate  1 item in n
 pages to reduce the leftovers.
 
 I'm not assuming anything, I was just being weary of having elements
 that are exactly page sized if that would cause a bit of spill into a
 second page. Don't tell me that PAGE_SIZE+10 (or whatever it might be)
 would ever be an optimal allocation size.
 
 Additionally, remember that turning on redzoning, which seems to be
 quite popular nowadays, actually blows out the slab size calculations
 anyway.
 
 Debugging will always throw these things out the window, we can't and
 should not optimize for that. That goes for slab, and for lots of other
 things.
 
 The bottom line is that it's better for us just to do exactly what we
 need and let the allocation algorithms figure out how to do it
 efficiently rather than trying to second guess them.
 
 Partly true, it's also silly to just hardcore power-of-2 numbers without
 ever bothering to look at what that results in (or even if it fits
 normal use patterns).
 
 We can easily be flexible, so it seems silly not to at least do a bit of
 background research.
 
The thing is that now every thing fits like a glove. i386/32bit-arch
have 16 bytes scatterlist struct, 256 in a page. x86_64/64bit-arch 32
byte and 128 fit exactly in a page. If we do any code that throws this
off it will be a performance regression. Call it beginners luck, call
it someone spent a long night handcrafting it this way. Just that I
think the current system is perfect and we should not touch it. There
are other options for bidi. (just my $0.02)

Boaz

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 17 May 2007 11:49:37 +0300

 FUJITA Tomonori wrote:
  From: Jens Axboe [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Thu, 17 May 2007 07:48:13 +0200
  
  On Thu, May 17 2007, FUJITA Tomonori wrote:
  From: Jens Axboe [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Wed, 16 May 2007 19:53:22 +0200
 
  On Wed, May 16 2007, Boaz Harrosh wrote:
  now there are 2 issues with this:
  1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
 requests for use_sg=128 which will crash the kernel.
  That sounds like a serious issue, it should definitely not happen. Stuff
  like that would bite other drivers as well, are you absolutely sure that
  is happening? Power-of-2 bug in your code, or in the SCSI code?
  Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?
 
 These are regular fs (ext3) requests during bootup. The machine will not
 boot. (Usually from the read ahead code)
 Don't believe me look at the second patch Over Tomo's cleanup.
 If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
 did in code:
   blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 I suppose someone is looking at a different definition. Or there is
 another call I need to do for this to work.

I modified your patch a bit (sgtable allocation) and set
SCSI_MAX_SG_SEGMENTS to 127. My ppc64 box seems to work (with
iscsi_tcp and ipr drivers).

iscsi_tcp sets sg_tablesize to 255, but blk_queue_max_phys_segments
seems to work.

One thing that I found is:

+#define scsi_resid(cmd) ((cmd)-sg_table-resid)


This doesn't work for some drivers (at least ipr) since they set
cmd-resid even with commands without data transfer.

This patch is against my cleanup tree.

Boaz, btw, don't worry about scsi_tgt_lib.c on
scsi_alloc/free_sgtable. You can break it. I need to fix it for bidi anyway.


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 70454b4..d8fb9c4 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -35,33 +35,34 @@ #define SG_MEMPOOL_SIZE 2
 
 struct scsi_host_sg_pool {
size_t  size;
-   char*name; 
+   char*name;
struct kmem_cache   *slab;
mempool_t   *pool;
 };
 
-#if (SCSI_MAX_PHYS_SEGMENTS  32)
-#error SCSI_MAX_PHYS_SEGMENTS is too small
-#endif
+/*
+ * Should fit within a single page.
+ */
+
+enum { SCSI_MAX_SG_SEGMENTS = 127 };
+
+enum { SCSI_MAX_SG_SEGMENTS_CALC =
+   ((PAGE_SIZE - sizeof(struct scsi_sg_table)) /
+   sizeof(struct scatterlist)) };
+
+#define SG_MEMPOOL_NR ARRAY_SIZE(scsi_sg_pools)
 
-#define SP(x) { x, sgpool- #x } 
+/*#define SG_MEMPOOL_NR (ARRAY_SIZE(scsi_sg_pools) - \
+   (SCSI_MAX_SG_SEGMENTS  254))*/
+
+#define SP(x) { x, sgpool- #x }
 static struct scsi_host_sg_pool scsi_sg_pools[] = {
-   SP(8),
-   SP(16),
-   SP(32),
-#if (SCSI_MAX_PHYS_SEGMENTS  32)
-   SP(64),
-#if (SCSI_MAX_PHYS_SEGMENTS  64)
-   SP(128),
-#if (SCSI_MAX_PHYS_SEGMENTS  128)
-   SP(256),
-#if (SCSI_MAX_PHYS_SEGMENTS  256)
-#error SCSI_MAX_PHYS_SEGMENTS is too large
-#endif
-#endif
-#endif
-#endif
-}; 
+   SP(7),
+   SP(15),
+   SP(31),
+   SP(63),
+   SP(SCSI_MAX_SG_SEGMENTS)
+};
 #undef SP
 
 static void scsi_run_queue(struct request_queue *q);
@@ -701,60 +702,38 @@ static struct scsi_cmnd *scsi_end_reques
return NULL;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+static void scsi_free_sgtable(struct scsi_sg_table *sgt)
 {
-   struct scsi_host_sg_pool *sgp;
-   struct scatterlist *sgl;
-
-   BUG_ON(!cmd-use_sg);
-
-   switch (cmd-use_sg) {
-   case 1 ... 8:
-   cmd-sglist_len = 0;
-   break;
-   case 9 ... 16:
-   cmd-sglist_len = 1;
-   break;
-   case 17 ... 32:
-   cmd-sglist_len = 2;
-   break;
-#if (SCSI_MAX_PHYS_SEGMENTS  32)
-   case 33 ... 64:
-   cmd-sglist_len = 3;
-   break;
-#if (SCSI_MAX_PHYS_SEGMENTS  64)
-   case 65 ... 128:
-   cmd-sglist_len = 4;
-   break;
-#if (SCSI_MAX_PHYS_SEGMENTS   128)
-   case 129 ... 256:
-   cmd-sglist_len = 5;
-   break;
-#endif
-#endif
-#endif
-   default:
-   return NULL;
-   }
+   printk(%s %d %d\n, __FUNCTION__, sgt-this_sg_count, SG_MEMPOOL_NR);
+   BUG_ON(sgt-this_sg_count = SG_MEMPOOL_NR);
 
-   sgp = scsi_sg_pools + cmd-sglist_len;
-   sgl = mempool_alloc(sgp-pool, gfp_mask);
-   return sgl;
+   mempool_free(sgt, scsi_sg_pools[sgt-this_sg_count].pool);
 }
 
-EXPORT_SYMBOL(scsi_alloc_sgtable);
-
-void scsi_free_sgtable(struct scatterlist *sgl, int index)
+static struct scsi_sg_table

Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 17 May 2007 11:49:37 +0300

 FUJITA Tomonori wrote:
  From: Jens Axboe [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Thu, 17 May 2007 07:48:13 +0200
  
  On Thu, May 17 2007, FUJITA Tomonori wrote:
  From: Jens Axboe [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Wed, 16 May 2007 19:53:22 +0200
 
  On Wed, May 16 2007, Boaz Harrosh wrote:
  now there are 2 issues with this:
  1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
 requests for use_sg=128 which will crash the kernel.
  That sounds like a serious issue, it should definitely not happen. Stuff
  like that would bite other drivers as well, are you absolutely sure that
  is happening? Power-of-2 bug in your code, or in the SCSI code?
  Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?
 
 These are regular fs (ext3) requests during bootup. The machine will not
 boot. (Usually from the read ahead code)
 Don't believe me look at the second patch Over Tomo's cleanup.
 If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
 did in code:
   blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 I suppose someone is looking at a different definition. Or there is
 another call I need to do for this to work.

ata_scsi_slave_config?
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread James Bottomley
On Thu, 2007-05-17 at 11:49 +0300, Boaz Harrosh wrote:
 These are regular fs (ext3) requests during bootup. The machine will not
 boot. (Usually from the read ahead code)
 Don't believe me look at the second patch Over Tomo's cleanup.
 If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
 did in code:
   blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 I suppose someone is looking at a different definition. Or there is
 another call I need to do for this to work.

It would really help us if you showed the actual code for what you did
and where ... if this is wrong, we have bigger problems that quibbling
about bidirectional slab sizes.  The correct way to adjust this limit
artificially to 127 is:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1f5a07b..4a27841 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1576,7 +1576,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host 
*shost,
return NULL;
 
blk_queue_max_hw_segments(q, shost-sg_tablesize);
-   blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS);
+   blk_queue_max_phys_segments(q, 127);
blk_queue_max_sectors(q, shost-max_sectors);
blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
blk_queue_segment_boundary(q, shost-dma_boundary);

(It doesn't alter the allocation pools or anything else, just limits the
max phys segments of the queue).  The way to check that this limit is
being honoured is:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1f5a07b..ae42e4d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1000,6 +1000,7 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
 * kmapping pages)
 */
cmd-use_sg = req-nr_phys_segments;
+   WARN_ON(req-nr_phys_segments  127);
 
/*
 * If sg table allocation fails, requeue request later.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread Boaz Harrosh
James Bottomley wrote:
 On Thu, 2007-05-17 at 11:49 +0300, Boaz Harrosh wrote:
 These are regular fs (ext3) requests during bootup. The machine will not
 boot. (Usually from the read ahead code)
 Don't believe me look at the second patch Over Tomo's cleanup.
 If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
 did in code:
  blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 I suppose someone is looking at a different definition. Or there is
 another call I need to do for this to work.
 
 It would really help us if you showed the actual code for what you did
 and where ... if this is wrong, we have bigger problems that quibbling
 about bidirectional slab sizes.  The correct way to adjust this limit
 artificially to 127 is:
 
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index 1f5a07b..4a27841 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -1576,7 +1576,7 @@ struct request_queue *__scsi_alloc_queue(struct 
 Scsi_Host *shost,
   return NULL;
  
   blk_queue_max_hw_segments(q, shost-sg_tablesize);
 - blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS);
 + blk_queue_max_phys_segments(q, 127);
   blk_queue_max_sectors(q, shost-max_sectors);
   blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
   blk_queue_segment_boundary(q, shost-dma_boundary);
 
 (It doesn't alter the allocation pools or anything else, just limits the
 max phys segments of the queue).  The way to check that this limit is
 being honoured is:
 
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index 1f5a07b..ae42e4d 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -1000,6 +1000,7 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
* kmapping pages)
*/
   cmd-use_sg = req-nr_phys_segments;
 + WARN_ON(req-nr_phys_segments  127);
  
   /*
* If sg table allocation fails, requeue request later.
 
 James
 
 
Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
fixed it. Now it works with 127.

(By the way Tomo, a printk like you did in scsi_init_io and
scsi_free_sgtable, 2 for every IO, or even 1 for every IO, will make
a SCSI booting PC like mine almost un-usable. Think of printk going
to log-file doing a printk...)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index dd81fa7..de8c796 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -800,7 +800,9 @@ int ata_scsi_slave_config(struct scsi_device *sdev)

ata_scsi_sdev_config(sdev);

-   blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD);
+   if ( sdev-request_queue-max_phys_segments  LIBATA_MAX_PRD )
+   blk_queue_max_phys_segments(sdev-request_queue,
+  LIBATA_MAX_PRD);

sdev-manage_start_stop = 1;


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 17 May 2007 17:00:21 +0300

 James Bottomley wrote:
  On Thu, 2007-05-17 at 11:49 +0300, Boaz Harrosh wrote:
  These are regular fs (ext3) requests during bootup. The machine will not
  boot. (Usually from the read ahead code)
  Don't believe me look at the second patch Over Tomo's cleanup.
  If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
  did in code:
 blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
  I suppose someone is looking at a different definition. Or there is
  another call I need to do for this to work.
  
  It would really help us if you showed the actual code for what you did
  and where ... if this is wrong, we have bigger problems that quibbling
  about bidirectional slab sizes.  The correct way to adjust this limit
  artificially to 127 is:
  
  diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
  index 1f5a07b..4a27841 100644
  --- a/drivers/scsi/scsi_lib.c
  +++ b/drivers/scsi/scsi_lib.c
  @@ -1576,7 +1576,7 @@ struct request_queue *__scsi_alloc_queue(struct 
  Scsi_Host *shost,
  return NULL;
   
  blk_queue_max_hw_segments(q, shost-sg_tablesize);
  -   blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS);
  +   blk_queue_max_phys_segments(q, 127);
  blk_queue_max_sectors(q, shost-max_sectors);
  blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
  blk_queue_segment_boundary(q, shost-dma_boundary);
  
  (It doesn't alter the allocation pools or anything else, just limits the
  max phys segments of the queue).  The way to check that this limit is
  being honoured is:
  
  diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
  index 1f5a07b..ae42e4d 100644
  --- a/drivers/scsi/scsi_lib.c
  +++ b/drivers/scsi/scsi_lib.c
  @@ -1000,6 +1000,7 @@ static int scsi_init_io(struct scsi_cmnd *cmd)
   * kmapping pages)
   */
  cmd-use_sg = req-nr_phys_segments;
  +   WARN_ON(req-nr_phys_segments  127);
   
  /*
   * If sg table allocation fails, requeue request later.
  
  James
  
  
 Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
 fixed it. Now it works with 127.

I think that we can just remove blk_queue_max_phys_segments since the
ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.


 (By the way Tomo, a printk like you did in scsi_init_io and
 scsi_free_sgtable, 2 for every IO, or even 1 for every IO, will make
 a SCSI booting PC like mine almost un-usable. Think of printk going
 to log-file doing a printk...)

Oops, I forgot to remove it since my SCSI booting pSeries box works
with that. :)
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread Boaz Harrosh
FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 17 May 2007 17:00:21 +0300
 
 Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
 fixed it. Now it works with 127.
 
 I think that we can just remove blk_queue_max_phys_segments since the
 ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.
 

Who should send a patch upstream? (I cc'ed Jeff Garzik)

Boaz

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index dd81fa7..3660f3e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)

ata_scsi_sdev_config(sdev);

-   blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD);
-
sdev-manage_start_stop = 1;

if (dev)

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread Benny Halevy
FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 17 May 2007 11:49:37 +0300
 
 FUJITA Tomonori wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 17 May 2007 07:48:13 +0200

 On Thu, May 17 2007, FUJITA Tomonori wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Wed, 16 May 2007 19:53:22 +0200

 On Wed, May 16 2007, Boaz Harrosh wrote:
 now there are 2 issues with this:
 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
requests for use_sg=128 which will crash the kernel.
 That sounds like a serious issue, it should definitely not happen. Stuff
 like that would bite other drivers as well, are you absolutely sure that
 is happening? Power-of-2 bug in your code, or in the SCSI code?
 Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?
 These are regular fs (ext3) requests during bootup. The machine will not
 boot. (Usually from the read ahead code)
 Don't believe me look at the second patch Over Tomo's cleanup.
 If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
 did in code:
  blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 I suppose someone is looking at a different definition. Or there is
 another call I need to do for this to work.
 
 I modified your patch a bit (sgtable allocation) and set
 SCSI_MAX_SG_SEGMENTS to 127. My ppc64 box seems to work (with
 iscsi_tcp and ipr drivers).
 
 iscsi_tcp sets sg_tablesize to 255, but blk_queue_max_phys_segments
 seems to work.
 
 One thing that I found is:
 
 +#define scsi_resid(cmd) ((cmd)-sg_table-resid)
 
 
 This doesn't work for some drivers (at least ipr) since they set
 cmd-resid even with commands without data transfer.

Hmm, since we need a residual count also for the bidi_in transfer
this problem is another reason for having the scsi_cmd_buff in struct
scsi_cmnd, allocate another one from a pool for the bidi case,
and point to the sglist in both cases rather than having a sg_table
header allocated along with the sg list.
Alternatively, having a pool for the no-data case might be another
possible solution, though it seems a bit odd to me.

snip

 -void scsi_free_sgtable(struct scatterlist *sgl, int index)
 +static struct scsi_sg_table *scsi_alloc_sgtable(int nseg, gfp_t gfp_mask)
  {
   struct scsi_host_sg_pool *sgp;
 + struct scsi_sg_table *sgt;
 + unsigned int idx;
  
 - BUG_ON(index = SG_MEMPOOL_NR);
 + for (idx = 0; idx  SG_MEMPOOL_NR; idx++)
 + if (scsi_sg_pools[idx].size = nseg)
 + goto found;

Tomo, I prefer the loop to be based on calculation as follows rather
than scanning the scsi_sg_pools table in order to minimize memory access
(and thrashing of the cpu data cache - each scsi_host_sg_pool takes a cache
row on x86_64)

+   for (i = 0, size = 8; i  SG_MEMPOOL_NR-1; i++, size = 1)
+   if (size  nents)
+   return i;

Benny
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-16 Thread Boaz Harrosh
Boaz Harrosh wrote:
 James Bottomley wrote:
 
 There's actually a fourth option you haven't considered:

 Roll all the required sglist definitions (request_bufflen,
 request_buffer, use_sg and sglist_len) into the sgtable pools.

 This is a grate Idea. Let me see if I understand what you mean.
 ...
 ...

Hi Dear James, list.
I have worked on proposed solution (If anyone is interested see
url blow)

Now it works and all but I hit some problems.
What happens is that in 64 bit architectures, well in x86_64,
the sizeof scatterlist structure is 32 bytes which means that
we can only fit exactly 128 of them in a page. But together with
the scsi_sg_table header we are down to 127. Also if we want
good alignment/packing of other sized pools we want:
static struct scsi_host_sg_pool scsi_sg_pools[] = {
SP(7),
SP(15),
SP(31),
SP(63),
SP(127)
};

now there are 2 issues with this:
1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
   requests for use_sg=128 which will crash the kernel.

2. If I do SPs of 7,15,31,63,128 or even 8,16,32,64,128 it will
   boot and work but clearly it does not fit in one page. So either
   my sata drivers and iSCSI, which I test, are not sensitive to
   scatterlists fit one page. Or kernel gives me 2 contiguous pages?

I do not see away out of this problem. I think that even with Jens's
chaining of sg_tables 128 is a magic number that we don't want to cross

It leaves me with option 3 on the bidi front. I think it would be best
to Just allocate another global mem_pool of scsi_data_buffer(s) just like
we do for scsi_io_context. The uni scsi_data_buffer will be embedded in
struct scsi_cmnd and the bidi one will be allocated from this pool.
I am open to any suggestions.

If anyone wants to see I have done 2 versions of this work. One on top
of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup
git. both can be found here:
  http://www.bhalevy.com/open-osd/download/scsi_sg_table/

Boaz

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-16 Thread Jens Axboe
On Wed, May 16 2007, Boaz Harrosh wrote:
 Boaz Harrosh wrote:
  James Bottomley wrote:
  
  There's actually a fourth option you haven't considered:
 
  Roll all the required sglist definitions (request_bufflen,
  request_buffer, use_sg and sglist_len) into the sgtable pools.
 
  This is a grate Idea. Let me see if I understand what you mean.
  ...
  ...
 
 Hi Dear James, list.
 I have worked on proposed solution (If anyone is interested see
 url blow)
 
 Now it works and all but I hit some problems.
 What happens is that in 64 bit architectures, well in x86_64,
 the sizeof scatterlist structure is 32 bytes which means that
 we can only fit exactly 128 of them in a page. But together with
 the scsi_sg_table header we are down to 127. Also if we want
 good alignment/packing of other sized pools we want:
 static struct scsi_host_sg_pool scsi_sg_pools[] = {
   SP(7),
   SP(15),
   SP(31),
   SP(63),
   SP(127)
 };
 
 now there are 2 issues with this:
 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
requests for use_sg=128 which will crash the kernel.

That sounds like a serious issue, it should definitely not happen. Stuff
like that would bite other drivers as well, are you absolutely sure that
is happening? Power-of-2 bug in your code, or in the SCSI code?

 2. If I do SPs of 7,15,31,63,128 or even 8,16,32,64,128 it will
boot and work but clearly it does not fit in one page. So either
my sata drivers and iSCSI, which I test, are not sensitive to
scatterlists fit one page. Or kernel gives me 2 contiguous pages?

The 1-page thing isn't a restriction as such, it's just an optimization.
The scatterlist allocated is purely a kernel entity, so you could do 4
contig pages and larger ios that way, if higher order allocations were
reliable.

But you are right in that we need to tweak the sg pool size so that it
ends up being a nice size, and not something that either spans a little
bit into a second page or doesn't fill a page nicely. On my x86-64 here,
a 128 segment sg table is exactly one page (looking at slabinfo). It
depends on the allocator whether that is just right, or just a little
too much due to management information.

 I do not see away out of this problem. I think that even with Jens's
 chaining of sg_tables 128 is a magic number that we don't want to cross

Label me skeptical of your findings so far :-)

 It leaves me with option 3 on the bidi front. I think it would be best
 to Just allocate another global mem_pool of scsi_data_buffer(s) just like
 we do for scsi_io_context. The uni scsi_data_buffer will be embedded in
 struct scsi_cmnd and the bidi one will be allocated from this pool.
 I am open to any suggestions.

To statements on this affair:

- Any sg table size should work, provided that we can safely and
  reliably allocate it. We stay with 1 page max for that reason alone.

- The max sg number has no magic beyond staying within a single page for
  allocation reasons.

The scattertable really only exists as a temporary way to store and
setup transfer, until it's mapped to a driver private sg structure. That
is why the phys segments limit is purely a driver property, it has
nothing to do with the hardware or the hardware limits.

 If anyone wants to see I have done 2 versions of this work. One on top
 of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup
 git. both can be found here:
   http://www.bhalevy.com/open-osd/download/scsi_sg_table/

+#define scsi_for_each_sg(cmd, sg, nseg, __i)   \
+   for (__i = 0, sg = scsi_sglist(cmd); __i  (nseg); __i++, (sg)++)

Hmm?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-16 Thread James Bottomley
On Wed, 2007-05-16 at 19:53 +0200, Jens Axboe wrote:
 The 1-page thing isn't a restriction as such, it's just an optimization.
 The scatterlist allocated is purely a kernel entity, so you could do 4
 contig pages and larger ios that way, if higher order allocations were
 reliable.
 
 But you are right in that we need to tweak the sg pool size so that it
 ends up being a nice size, and not something that either spans a little
 bit into a second page or doesn't fill a page nicely. On my x86-64 here,
 a 128 segment sg table is exactly one page (looking at slabinfo). It
 depends on the allocator whether that is just right, or just a little
 too much due to management information.

Actually, if you look at the slab allocation algorithm (particularly
calculate_slab_order()) you'll find it's not as simplistic as you're
assuming ... what it actually does is try to allocate  1 item in n
pages to reduce the leftovers.

Additionally, remember that turning on redzoning, which seems to be
quite popular nowadays, actually blows out the slab size calculations
anyway.

The bottom line is that it's better for us just to do exactly what we
need and let the allocation algorithms figure out how to do it
efficiently rather than trying to second guess them.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-16 Thread Jens Axboe
On Wed, May 16 2007, James Bottomley wrote:
 On Wed, 2007-05-16 at 19:53 +0200, Jens Axboe wrote:
  The 1-page thing isn't a restriction as such, it's just an optimization.
  The scatterlist allocated is purely a kernel entity, so you could do 4
  contig pages and larger ios that way, if higher order allocations were
  reliable.
  
  But you are right in that we need to tweak the sg pool size so that it
  ends up being a nice size, and not something that either spans a little
  bit into a second page or doesn't fill a page nicely. On my x86-64 here,
  a 128 segment sg table is exactly one page (looking at slabinfo). It
  depends on the allocator whether that is just right, or just a little
  too much due to management information.
 
 Actually, if you look at the slab allocation algorithm (particularly
 calculate_slab_order()) you'll find it's not as simplistic as you're
 assuming ... what it actually does is try to allocate  1 item in n
 pages to reduce the leftovers.

I'm not assuming anything, I was just being weary of having elements
that are exactly page sized if that would cause a bit of spill into a
second page. Don't tell me that PAGE_SIZE+10 (or whatever it might be)
would ever be an optimal allocation size.

 Additionally, remember that turning on redzoning, which seems to be
 quite popular nowadays, actually blows out the slab size calculations
 anyway.

Debugging will always throw these things out the window, we can't and
should not optimize for that. That goes for slab, and for lots of other
things.

 The bottom line is that it's better for us just to do exactly what we
 need and let the allocation algorithms figure out how to do it
 efficiently rather than trying to second guess them.

Partly true, it's also silly to just hardcore power-of-2 numbers without
ever bothering to look at what that results in (or even if it fits
normal use patterns).

We can easily be flexible, so it seems silly not to at least do a bit of
background research.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-16 Thread FUJITA Tomonori
From: Jens Axboe [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Wed, 16 May 2007 19:53:22 +0200

 On Wed, May 16 2007, Boaz Harrosh wrote:
  Boaz Harrosh wrote:
   James Bottomley wrote:
   
   There's actually a fourth option you haven't considered:
  
   Roll all the required sglist definitions (request_bufflen,
   request_buffer, use_sg and sglist_len) into the sgtable pools.
  
   This is a grate Idea. Let me see if I understand what you mean.
   ...
   ...
  
  Hi Dear James, list.
  I have worked on proposed solution (If anyone is interested see
  url blow)
  
  Now it works and all but I hit some problems.
  What happens is that in 64 bit architectures, well in x86_64,
  the sizeof scatterlist structure is 32 bytes which means that
  we can only fit exactly 128 of them in a page. But together with
  the scsi_sg_table header we are down to 127. Also if we want
  good alignment/packing of other sized pools we want:
  static struct scsi_host_sg_pool scsi_sg_pools[] = {
  SP(7),
  SP(15),
  SP(31),
  SP(63),
  SP(127)
  };
  
  now there are 2 issues with this:
  1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
 requests for use_sg=128 which will crash the kernel.
 
 That sounds like a serious issue, it should definitely not happen. Stuff
 like that would bite other drivers as well, are you absolutely sure that
 is happening? Power-of-2 bug in your code, or in the SCSI code?

Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?


  If anyone wants to see I have done 2 versions of this work. One on top
  of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup
  git. both can be found here:
http://www.bhalevy.com/open-osd/download/scsi_sg_table/
 
 +#define scsi_for_each_sg(cmd, sg, nseg, __i)   \
 +   for (__i = 0, sg = scsi_sglist(cmd); __i  (nseg); __i++, (sg)++)
 
 Hmm?

When for_each_sg is ready, we convert scsi_for_each_sg to use
for_each_sg. I finished the cleanups on more than 40 drivers on the
top of your patches.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-16 Thread Jens Axboe
On Thu, May 17 2007, FUJITA Tomonori wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Wed, 16 May 2007 19:53:22 +0200
 
  On Wed, May 16 2007, Boaz Harrosh wrote:
   Boaz Harrosh wrote:
James Bottomley wrote:

There's actually a fourth option you haven't considered:
   
Roll all the required sglist definitions (request_bufflen,
request_buffer, use_sg and sglist_len) into the sgtable pools.
   
This is a grate Idea. Let me see if I understand what you mean.
...
...
   
   Hi Dear James, list.
   I have worked on proposed solution (If anyone is interested see
   url blow)
   
   Now it works and all but I hit some problems.
   What happens is that in 64 bit architectures, well in x86_64,
   the sizeof scatterlist structure is 32 bytes which means that
   we can only fit exactly 128 of them in a page. But together with
   the scsi_sg_table header we are down to 127. Also if we want
   good alignment/packing of other sized pools we want:
   static struct scsi_host_sg_pool scsi_sg_pools[] = {
 SP(7),
 SP(15),
 SP(31),
 SP(63),
 SP(127)
   };
   
   now there are 2 issues with this:
   1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
  requests for use_sg=128 which will crash the kernel.
  
  That sounds like a serious issue, it should definitely not happen. Stuff
  like that would bite other drivers as well, are you absolutely sure that
  is happening? Power-of-2 bug in your code, or in the SCSI code?
 
 Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?
 
 
   If anyone wants to see I have done 2 versions of this work. One on top
   of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup
   git. both can be found here:
 http://www.bhalevy.com/open-osd/download/scsi_sg_table/
  
  +#define scsi_for_each_sg(cmd, sg, nseg, __i)   \
  +   for (__i = 0, sg = scsi_sglist(cmd); __i  (nseg); __i++, (sg)++)
  
  Hmm?
 
 When for_each_sg is ready, we convert scsi_for_each_sg to use
 for_each_sg. I finished the cleanups on more than 40 drivers on the
 top of your patches.

But this is from the patch that is based on my sglist branch, so it
looked somewhat odd.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-11 Thread James Bottomley
On Thu, 2007-05-10 at 11:10 -0400, Douglas Gilbert wrote:
  +#define scsi_resid(cmd) ((cmd)-resid)
 
 I have defined resid in the past as a signed (32 bit)
 integer following the CAM spec. The cases are:
- resid=0 : initiator's DMA engine got (or sent?) the
number of bytes it expected
- resid0 : dxfer_len-resid bytes received, less than
expected
- resid0 : more bytes received (or could have been)
than indicated by dxfer_len
 
 Some definitions of resid make it unsigned which rules out
 the less common resid0 case. Can this case happen? Does it
 happen in practice? Naturally no more than dxfer_len should
 be placed in the scatter gather list.

Attempted overrun is usually, and correctly in my opinion, treated as a
fatal error in most drivers, so for them the resid  0 case can never
happen.

 SPI and SAS do not convey dxfer_len (or data direction) to
 a target in their transport frames. This means that the
 relevant field in the cdb (e.g. transfer length) dictates
 how much data a target sends back to an initiator in the
 case of a read like operation. So that opens up the
 possibility that dxfer_len and the cdb disagree.

Well, actually the drivers do (or those few that pay attention).
dxfer_len is used to program the SG lists into the device DMA engine, so
drivers can retrieve the actuals from the DMA engine at the end of the
transfer and set resid accordingly.

 FCP does convey dxfer_len and data_direction to a target.
 So a FCP target can detect a mismatch between this and the
 cdb and send resid information (either under or over) in its
 response frame back to the initiator. Alternatively it
 can treat the over case as an error (fcp3r04.pdf section
 9.4.2).
 iSCSI and SRP?
 
 The resid0 case may become more common if a driver or
 application does not properly take account of protection
 information being sent together with the requested
 data.
 
 So should we keep resid signed?

I don't really see a compelling reason to alter the driver behaviour (at
least for those where overrun is fatal).  However, since resid is signed
in the current structure, it makes sense to propagate that.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-10 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Wed, 09 May 2007 19:54:32 +0300

 James Bottomley wrote:
  Actually, the first order of business is to use accessors on the command
  pointers in the drivers to free them from the internal layout of the
  structure (and where it is allocated).
  
 Thanks! I totally second that. Let me look into my old patches and come
 up with all the needed accessors. I hope 3-5 will be enough.
 I will send some suggestions tomorrow.
  
  No, that's why you do the accessors.  Convert all of the common drivers
  to accessors on the current structure, then throw the switch to convert
  to the new structure (whatever form is finally settled on).  Then any
  unconverted drivers break and people fix the ones they care about.
 
 Last time I was able to compile 97% of drivers and convert by 
 search-and-replace
 the rest. Not a huge deal.

We need to remove the non-use-sg code in the drivers and convert
them. So it's a bit more complicated than search-and-replace.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-10 Thread Boaz Harrosh
FUJITA Tomonori wrote:
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 10 May 2007 15:53:22 +0900
 
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Wed, 09 May 2007 19:54:32 +0300

 James Bottomley wrote:
 Actually, the first order of business is to use accessors on the command
 pointers in the drivers to free them from the internal layout of the
 structure (and where it is allocated).

 Thanks! I totally second that. Let me look into my old patches and come
 up with all the needed accessors. I hope 3-5 will be enough.
 I will send some suggestions tomorrow.
 No, that's why you do the accessors.  Convert all of the common drivers
 to accessors on the current structure, then throw the switch to convert
 to the new structure (whatever form is finally settled on).  Then any
 unconverted drivers break and people fix the ones they care about.
 Last time I was able to compile 97% of drivers and convert by 
 search-and-replace
 the rest. Not a huge deal.
 We need to remove the non-use-sg code in the drivers and convert
 them. So it's a bit more complicated than search-and-replace.
 
 Here's a patch to convert ibmvscsi to use helper functions (though it
 needs more testings).
 
 ---
 ---
 diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
 index d6948d0..799f204 100644
 --- a/include/scsi/scsi_cmnd.h
 +++ b/include/scsi/scsi_cmnd.h
 @@ -138,4 +138,17 @@ extern void scsi_kunmap_atomic_sg(void *
  extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
  extern void scsi_free_sgtable(struct scatterlist *, int);
  
 +extern int scsi_dma_map(struct device *dev, struct scsi_cmnd *cmd);
 +extern void scsi_dma_unmap(struct device *dev, struct scsi_cmnd *cmd);
 +
 +/* moved to scatterlist.h after chaining sg */
 +#define sg_next(sg) ((sg) + 1)
 +
 +#define scsi_for_each_sg(cmd, nseg, i)   
 \
 + for (i = 0, sg = (cmd)-request_buffer; i  nseg;   \
 + sg = sg_next(sg), i++)  \
 +

Better we do like Jens's patch
+#define for_each_sg(sglist, sg, nr, __i)   \
+   for (__i = 0, sg = (sglist); __i  (nr); __i++, sg = sg_next(sg))

I think that we should wait for Jens pending patchset and do this work on top
of his, then use his sg macros directly. This way the cleaners can also be
watchfully for any drivers that might brake with big IO sent to them.

 +#define scsi_sg_count(cmd) ((cmd)-use_sg)
 +#define scsi_bufferlen(cmd) ((cmd)-request_bufflen)
 +
  #endif /* _SCSI_SCSI_CMND_H */

Above helpers look good. However I am missing 2 of them:

1.
+#define scsi_sgl(cmd) ((struct scatterlist*)(cmd)-request_buffer)

This is for drivers like iSCSI that do not do any dma mapping, as dma
is done at the lower level in the NICs. And lots of other places that just
transfer the sglist around.

2.
+#define scsi_resid(cmd) ((cmd)-resid)

Boaz


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-10 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 10 May 2007 15:37:48 +0300

  +/* moved to scatterlist.h after chaining sg */
  +#define sg_next(sg) ((sg) + 1)
  +
  +#define scsi_for_each_sg(cmd, nseg, i) 
  \
  +   for (i = 0, sg = (cmd)-request_buffer; i  nseg;   \
  +   sg = sg_next(sg), i++)  \
  +
 
 Better we do like Jens's patch
 +#define for_each_sg(sglist, sg, nr, __i) \
 + for (__i = 0, sg = (sglist); __i  (nr); __i++, sg = sg_next(sg))
 
 I think that we should wait for Jens pending patchset and do this work on top
 of his, then use his sg macros directly. This way the cleaners can also be
 watchfully for any drivers that might brake with big IO sent to them.

Yeah, we can use sg list's macro something like:

#define scsi_for_each_sg(cmd, sg, nseg, __i)\
for_each_sg((cmd)-request_buffer, sg, nseg, __i)   

Seems that Jens converted lots of LLDs to use for_each_sg. If
for_each_sg goes first, we can just replase for_each_sg (but after
all, we need to touch LLDs to remove the non-use-sg path). On the
other hand, if we go first, there is no conversion for for_each_sg in
LLDs.

Jens, do you want the sg list stuff done before our cleanups?
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-10 Thread Boaz Harrosh
Douglas Gilbert wrote:
 Boaz Harrosh wrote:
 FUJITA Tomonori wrote:
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 10 May 2007 15:53:22 +0900

 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Wed, 09 May 2007 19:54:32 +0300

 James Bottomley wrote:
 Actually, the first order of business is to use accessors on the command
 pointers in the drivers to free them from the internal layout of the
 structure (and where it is allocated).

 Thanks! I totally second that. Let me look into my old patches and come
 up with all the needed accessors. I hope 3-5 will be enough.
 I will send some suggestions tomorrow.
 No, that's why you do the accessors.  Convert all of the common drivers
 to accessors on the current structure, then throw the switch to convert
 to the new structure (whatever form is finally settled on).  Then any
 unconverted drivers break and people fix the ones they care about.
 Last time I was able to compile 97% of drivers and convert by 
 search-and-replace
 the rest. Not a huge deal.
 We need to remove the non-use-sg code in the drivers and convert
 them. So it's a bit more complicated than search-and-replace.
 Here's a patch to convert ibmvscsi to use helper functions (though it
 needs more testings).

 ---
 ---
 diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
 index d6948d0..799f204 100644
 --- a/include/scsi/scsi_cmnd.h
 +++ b/include/scsi/scsi_cmnd.h
 @@ -138,4 +138,17 @@ extern void scsi_kunmap_atomic_sg(void *
  extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t);
  extern void scsi_free_sgtable(struct scatterlist *, int);
  
 +extern int scsi_dma_map(struct device *dev, struct scsi_cmnd *cmd);
 +extern void scsi_dma_unmap(struct device *dev, struct scsi_cmnd *cmd);
 +
 +/* moved to scatterlist.h after chaining sg */
 +#define sg_next(sg) ((sg) + 1)
 +
 +#define scsi_for_each_sg(cmd, nseg, i) 
 \
 +   for (i = 0, sg = (cmd)-request_buffer; i  nseg;   \
 +   sg = sg_next(sg), i++)  \
 +
 Better we do like Jens's patch
 +#define for_each_sg(sglist, sg, nr, __i)\
 +for (__i = 0, sg = (sglist); __i  (nr); __i++, sg = sg_next(sg))

 I think that we should wait for Jens pending patchset and do this work on top
 of his, then use his sg macros directly. This way the cleaners can also be
 watchfully for any drivers that might brake with big IO sent to them.

 +#define scsi_sg_count(cmd) ((cmd)-use_sg)
 +#define scsi_bufferlen(cmd) ((cmd)-request_bufflen)
 +
  #endif /* _SCSI_SCSI_CMND_H */
 Above helpers look good. However I am missing 2 of them:

 1.
 +#define scsi_sgl(cmd) ((struct scatterlist*)(cmd)-request_buffer)

 This is for drivers like iSCSI that do not do any dma mapping, as dma
 is done at the lower level in the NICs. And lots of other places that just
 transfer the sglist around.

 2.
 +#define scsi_resid(cmd) ((cmd)-resid)
 
 I have defined resid in the past as a signed (32 bit)
 integer following the CAM spec. The cases are:
- resid=0 : initiator's DMA engine got (or sent?) the
number of bytes it expected
- resid0 : dxfer_len-resid bytes received, less than
expected
- resid0 : more bytes received (or could have been)
than indicated by dxfer_len
 
 Some definitions of resid make it unsigned which rules out
 the less common resid0 case. Can this case happen? Does it
 happen in practice? Naturally no more than dxfer_len should
 be placed in the scatter gather list.
 
 SPI and SAS do not convey dxfer_len (or data direction) to
 a target in their transport frames. This means that the
 relevant field in the cdb (e.g. transfer length) dictates
 how much data a target sends back to an initiator in the
 case of a read like operation. So that opens up the
 possibility that dxfer_len and the cdb disagree.
 
 FCP does convey dxfer_len and data_direction to a target.
 So a FCP target can detect a mismatch between this and the
 cdb and send resid information (either under or over) in its
 response frame back to the initiator. Alternatively it
 can treat the over case as an error (fcp3r04.pdf section
 9.4.2).
 iSCSI and SRP?
 
 The resid0 case may become more common if a driver or
 application does not properly take account of protection
 information being sent together with the requested
 data.
 
 So should we keep resid signed?
 
 Doug Gilbert
You are probably right just that I was afraid of this: req-data_len = sc-resid
and I saw in iSCSI :

iscsi_scsi_cmd_rsp()
...

if (res_count  0  res_count = sc-request_bufflen)
sc-resid = res_count;
else
sc-result = (DID_BAD_TARGET  16) | rhdr-cmd_status;

So I'm confused. It looks like the struct scsi_cmnd member resid is only the
one case of 0 = target_resid

Re: [PATCH v2] add bidi support for block pc requests

2007-05-09 Thread Boaz Harrosh
James Bottomley wrote:
 I think you'll find that kzalloc comes directly out of a slab for this
 size of allocation anyway ... you mean you want to see a dedicated pool
 for this specific allocation?
Yes, As you said below so we can always send IO for forward progress
of freeing memory. My test machine is a Linux cluster in front of a
pNFS over OSD. The HPC cluster is diskless. It will reach this
situation very fast.

 There's another problem in that it destroys our forward progress
 guarantee.  There's always a single reserve command for every HBA so
 that forward progress for freeing memory can always be made in the
 system even if the command slab is out and we have to reclaim memory
 through a HBA with no outstanding commands.  Allocating two commands per
 bidirectional request hoses that guarantee ... it could be fixed up by
 increasing the reserve pool to 2, but that's adding further unwanted
 complexity ...
 
Thanks for catching it! I was afraid of that. If we stick with this solution
in the interim until we do what you suggested below, we will need to put one
more for bidi. It should not be a complicated pool thing, just a reserved one
for the bidi case.

 
 There's actually a fourth option you haven't considered:
 
 Roll all the required sglist definitions (request_bufflen,
 request_buffer, use_sg and sglist_len) into the sgtable pools.
 
 We're getting very close to the point where someone gets to sweep
 through the drivers eliminating the now superfluous non-sg path in the
 queuecommand.  When that happens the only cases become no transfer or SG
 backed commands.  At this point we can do a consolidation of the struct
 scsi_cmnd fields.  This does represent the ideal time to sweep the sg
 list handling fields into the sgtable and simply have a single pointer
 to struct sgtable in the scsi_cmnd (== NULL is the signal for a no
 transfer command).
 
This is a grate Idea. Let me see if I understand what you mean.
1. An sgtable is a single allocation with an sgtable header type
   at the begining and a veriable size array of struct scatterlist.
   something like:
   struct sgtable {
struct sgtable_header {
unsigned sg_count, sglist_len, length;
struct sgtable* next; //for Jens's big io
} hdr;
struct scatterlist sglist[];
   }
   Slabs are put up for above sgtable of different sizes as
   done today. (Should they be sized on different ARCHs to
   align on page boundaries?)

2. The way we can do this in stages: Meaning put up code that has
   both sets of API, Transfer drivers one-by-one to new API, deprecate
   old API for a kernel cycle or two. Than submit last piece of
   code that removes the old API.
   It can be done. We just need to copy sgtable_header fields
   to the old fields, and let them stick around for a while.

3. The second bidi sgtable will hang on request-next_rq-special.

 James
 
 
If everyone agrees on something like above. I can do it right away.
It's a solution I wouldn't even dream of. Thanks

Boaz
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-09 Thread FUJITA Tomonori
From: James Bottomley [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Tue, 08 May 2007 15:01:37 -0500

 Roll all the required sglist definitions (request_bufflen,
 request_buffer, use_sg and sglist_len) into the sgtable pools.
 
 We're getting very close to the point where someone gets to sweep
 through the drivers eliminating the now superfluous non-sg path in the
 queuecommand.

The non-use-sg case is still alive?
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-09 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Wed, 09 May 2007 10:46:34 +0300

  Roll all the required sglist definitions (request_bufflen,
  request_buffer, use_sg and sglist_len) into the sgtable pools.
  
  We're getting very close to the point where someone gets to sweep
  through the drivers eliminating the now superfluous non-sg path in the
  queuecommand.  When that happens the only cases become no transfer or SG
  backed commands.  At this point we can do a consolidation of the struct
  scsi_cmnd fields.  This does represent the ideal time to sweep the sg
  list handling fields into the sgtable and simply have a single pointer
  to struct sgtable in the scsi_cmnd (== NULL is the signal for a no
  transfer command).
  
 This is a grate Idea. Let me see if I understand what you mean.
 1. An sgtable is a single allocation with an sgtable header type
at the begining and a veriable size array of struct scatterlist.
something like:
struct sgtable {
   struct sgtable_header {
   unsigned sg_count, sglist_len, length;
   struct sgtable* next; //for Jens's big io
   } hdr;
   struct scatterlist sglist[];
}

Can we have more simple sgtable?

struct sgtable {
unsigned use_sg;
unsigned length;
unsigned sglist_len;
struct scatterlist sglist[0];
};


Then we could do something like this:

struct scsi_host_sgtable_pool {
size_t size;
char *name;
struct kmem_cache *slab;
mempool_t *pool;
};

int __init scsi_init_queue(void)
{
for (i = 0; i  SG_MEMPOOL_NR; i++) {
struct scsi_host_sgtable_pool *sgtbp = scsi_sgtable_pools + i;
int size = sizeof(struct sgtable) + sgp-size * sizeof(struct 
scatterlist);

sgp-slab = kmem_cache_create(sgp-name, size, 0,
SLAB_HWCACHE_ALIGN, NULL, NULL);
if (!sgp-slab) {
printk(KERN_ERR SCSI: can't init sg slab %s\n,
sgp-name);
}

sgp-pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
 sgp-slab);


Jens' chaining sg lists adds sg-next so we don't need
sgtable-next. We can just add __use_sg to struct sgtable.


 2. The way we can do this in stages: Meaning put up code that has
both sets of API, Transfer drivers one-by-one to new API, deprecate
old API for a kernel cycle or two. Than submit last piece of
code that removes the old API.
It can be done. We just need to copy sgtable_header fields
to the old fields, and let them stick around for a while.

That's not bad, but can we convert all the LLDs all at once?

The changes to scsi mid-layer are almost done. Probabaly, you can just
add sgtable stuff (and scsi_dma_map/scsi_dma_unmap helper functions
that Christoph proposed long ago) to my previous patch. It can be done
for several hours. So you have enough time for the LLDs' changes.


 3. The second bidi sgtable will hang on request-next_rq-special.

I think so.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-09 Thread Boaz Harrosh
FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Wed, 09 May 2007 10:46:34 +0300
 
 Roll all the required sglist definitions (request_bufflen,
 request_buffer, use_sg and sglist_len) into the sgtable pools.

 We're getting very close to the point where someone gets to sweep
 through the drivers eliminating the now superfluous non-sg path in the
 queuecommand.  When that happens the only cases become no transfer or SG
 backed commands.  At this point we can do a consolidation of the struct
 scsi_cmnd fields.  This does represent the ideal time to sweep the sg
 list handling fields into the sgtable and simply have a single pointer
 to struct sgtable in the scsi_cmnd (== NULL is the signal for a no
 transfer command).

 This is a grate Idea. Let me see if I understand what you mean.
 1. An sgtable is a single allocation with an sgtable header type
at the begining and a veriable size array of struct scatterlist.
something like:
struct sgtable {
  struct sgtable_header {
  unsigned sg_count, sglist_len, length;
  struct sgtable* next; //for Jens's big io
  } hdr;
  struct scatterlist sglist[];
}
 
 Can we have more simple sgtable?
 
 struct sgtable {
   unsigned use_sg;
   unsigned length;
   unsigned sglist_len;
   struct scatterlist sglist[0];
 };
 
Yes sure. It was just an example.
One comment, use_sg = sg_count.
By the way I have forgotten some fields so it should be:

struct sgtable {
unsigned short sg_count;
unsigned short sg_pool; /* note that this is the pool index and not the 
actual count */
unsigned length;
unsigned resid;
struct scatterlist sglist[0];
};

resid/length together with request-data_len can be optimized, this is the 
current system.

 
 Then we could do something like this:
 
 struct scsi_host_sgtable_pool {
   size_t size;
   char *name;
   struct kmem_cache *slab;
   mempool_t *pool;
 };
 
 int __init scsi_init_queue(void)
 {
   for (i = 0; i  SG_MEMPOOL_NR; i++) {
   struct scsi_host_sgtable_pool *sgtbp = scsi_sgtable_pools + i;
   int size = sizeof(struct sgtable) + sgp-size * sizeof(struct 
 scatterlist);
 
   sgp-slab = kmem_cache_create(sgp-name, size, 0,
   SLAB_HWCACHE_ALIGN, NULL, NULL);
   if (!sgp-slab) {
   printk(KERN_ERR SCSI: can't init sg slab %s\n,
   sgp-name);
   }
 
   sgp-pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
sgp-slab);
 
 
I think we can do a better job here by fitting exactly the number of scatterlist
entries that will take up full pages including the header. This is sizes
dependent and can be compile-time calculated. For example in x86_64, with 
header,
145 scatterlist will fit in a page so this is kind of magic number for this 
arch.

could someone explain why we need scatterlist-max-count a base-2 number?

 Jens' chaining sg lists adds sg-next so we don't need
 sgtable-next. We can just add __use_sg to struct sgtable.
 
Yes but it uses the last struct scatterlist for the -next, this way it is 
saved.
On the other hand it wastes a pointer in small IOs. So I guess this is Jens's
call.
If the -next in both cases will point to a struct sgtable above than we 
don't need
__use_sg since sg_pool(sglist_len) holds the pool this came from. If not 
__use_sg
must be added to struct sgtable.

It looks like proposed change races with Jens's work so it's better be done 
after
his. It could be nice if he incorporates some of the constrains from here before
hand.

 
 2. The way we can do this in stages: Meaning put up code that has
both sets of API, Transfer drivers one-by-one to new API, deprecate
old API for a kernel cycle or two. Than submit last piece of
code that removes the old API.
It can be done. We just need to copy sgtable_header fields
to the old fields, and let them stick around for a while.
 
 That's not bad, but can we convert all the LLDs all at once?
 
 The changes to scsi mid-layer are almost done. Probabaly, you can just
 add sgtable stuff (and scsi_dma_map/scsi_dma_unmap helper functions
 that Christoph proposed long ago) to my previous patch. It can be done
 for several hours. So you have enough time for the LLDs' changes.
I am here to do any work needed. I will wait to see what is decided.

I already have a 2.6.20 tree with all llds converted to things like
scsi_uni(cmd)-sg where scsi_uni() was pointing to:
struct scsi_data_buffer {
unsigned short sg_count;
unsigned short sg_pool; /* note that this is the pool index and not the 
actual count */
unsigned length;
unsigned resid;
struct scatterlist *sg;
};
as part of my old bidi work. So a simple search and replace can be done

Re: [PATCH v2] add bidi support for block pc requests

2007-05-09 Thread FUJITA Tomonori
From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Wed, 09 May 2007 16:58:24 +0300

 FUJITA Tomonori wrote:
  From: Boaz Harrosh [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Wed, 09 May 2007 10:46:34 +0300
  
  Roll all the required sglist definitions (request_bufflen,
  request_buffer, use_sg and sglist_len) into the sgtable pools.
 
  We're getting very close to the point where someone gets to sweep
  through the drivers eliminating the now superfluous non-sg path in the
  queuecommand.  When that happens the only cases become no transfer or SG
  backed commands.  At this point we can do a consolidation of the struct
  scsi_cmnd fields.  This does represent the ideal time to sweep the sg
  list handling fields into the sgtable and simply have a single pointer
  to struct sgtable in the scsi_cmnd (== NULL is the signal for a no
  transfer command).
 
  This is a grate Idea. Let me see if I understand what you mean.
  1. An sgtable is a single allocation with an sgtable header type
 at the begining and a veriable size array of struct scatterlist.
 something like:
 struct sgtable {
 struct sgtable_header {
 unsigned sg_count, sglist_len, length;
 struct sgtable* next; //for Jens's big io
 } hdr;
 struct scatterlist sglist[];
 }
  
  Can we have more simple sgtable?
  
  struct sgtable {
  unsigned use_sg;
  unsigned length;
  unsigned sglist_len;
  struct scatterlist sglist[0];
  };
  
 Yes sure. It was just an example.
 One comment, use_sg = sg_count.
 By the way I have forgotten some fields so it should be:
 
 struct sgtable {
   unsigned short sg_count;
   unsigned short sg_pool; /* note that this is the pool index and not the 
 actual count */
   unsigned length;
   unsigned resid;
   struct scatterlist sglist[0];
 };
 
 resid/length together with request-data_len can be optimized, this is the 
 current system.
 
  
  Then we could do something like this:
  
  struct scsi_host_sgtable_pool {
  size_t size;
  char *name;
  struct kmem_cache *slab;
  mempool_t *pool;
  };
  
  int __init scsi_init_queue(void)
  {
  for (i = 0; i  SG_MEMPOOL_NR; i++) {
  struct scsi_host_sgtable_pool *sgtbp = scsi_sgtable_pools + i;
  int size = sizeof(struct sgtable) + sgp-size * sizeof(struct 
  scatterlist);
  
  sgp-slab = kmem_cache_create(sgp-name, size, 0,
  SLAB_HWCACHE_ALIGN, NULL, NULL);
  if (!sgp-slab) {
  printk(KERN_ERR SCSI: can't init sg slab %s\n,
  sgp-name);
  }
  
  sgp-pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE,
   sgp-slab);
  
  
 I think we can do a better job here by fitting exactly the number of 
 scatterlist
 entries that will take up full pages including the header. This is sizes
 dependent and can be compile-time calculated. For example in x86_64, with 
 header,
 145 scatterlist will fit in a page so this is kind of magic number for this 
 arch.
 
 could someone explain why we need scatterlist-max-count a base-2 number?

To let the slab allocater to do better job?

We can improve these issues later without disturbing LLDs. No need to
do this right now.


  Jens' chaining sg lists adds sg-next so we don't need
  sgtable-next. We can just add __use_sg to struct sgtable.
  
 Yes but it uses the last struct scatterlist for the -next, this way
 it is saved.  On the other hand it wastes a pointer in small IOs. So
 I guess this is Jens's call.  If the -next in both cases will
 point to a struct sgtable above

I think that some non scsi stuff also need chaining sg. so sg needs a
chaining scheme; sg-next, crypto layer's hack (not to add sg-next),
etc. Instead of using sg's chaining, having stable-next is wrong.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-09 Thread Boaz Harrosh
James Bottomley wrote:
 Actually, the first order of business is to use accessors on the command
 pointers in the drivers to free them from the internal layout of the
 structure (and where it is allocated).
 
Thanks! I totally second that. Let me look into my old patches and come
up with all the needed accessors. I hope 3-5 will be enough.
I will send some suggestions tomorrow.
 
 No, that's why you do the accessors.  Convert all of the common drivers
 to accessors on the current structure, then throw the switch to convert
 to the new structure (whatever form is finally settled on).  Then any
 unconverted drivers break and people fix the ones they care about.

Last time I was able to compile 97% of drivers and convert by search-and-replace
the rest. Not a huge deal.

 
 James
 
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-05-08 Thread Boaz Harrosh
FUJITA Tomonori wrote:
 Here is an updated version of the patch to add bidi support to block
 pc requests. Bugs spotted by Benny were fixed.
 
 This patch can be applied cleanly to the scsi-misc git tree and is on
 the top of the following patch to add linked request support:
 
 http://marc.info/?l=linux-scsim=117835587615642w=2
 
 ---
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Date: Mon, 7 May 2007 16:42:24 +0900
 Subject: [PATCH] add bidi support to scsi pc commands
 
 This adds bidi support to block pc requests.
 
 A bidi request uses req-next_rq pointer for an in request.
 
 This patch introduces a new structure, scsi_data_buffer to hold the
 data buffer information. To avoid make scsi_cmnd structure fatter, the
 scsi mid-layer uses cmnd-request-next_rq-special pointer for
 a scsi_data_buffer structure. LLDs don't touch the second request
 (req-next_rq) so it's safe to use req-special.
 
 scsi_blk_pc_done() always completes the whole command so
 scsi_end_request() simply completes the bidi chunk too.
 
 A helper function, scsi_bidi_data_buffer() is for LLDs to access to
 the scsi_data_buffer structure easily.
 
 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 ---
 @@ -685,6 +696,14 @@ static struct scsi_cmnd *scsi_end_request(struct 
 scsi_cmnd *cmd, int uptodate,
   }
   }
  
 + /*
 +  * a REQ_BLOCK_PC command is always completed fully so just do
 +  * end the bidi chunk.
 +  */
 + if (sdb)
 + end_that_request_chunk(req-next_rq, uptodate,
 +sdb-request_bufflen);
 +

sdb-request_bufflen was zeroed in scsi_release_buffers which is called before
scsi_end_request.

  static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
  {
   BUG_ON(!blk_pc_request(cmd-request));
 @@ -1090,10 +1159,22 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
  static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request 
 *req)
  {
   struct scsi_cmnd *cmd;
 + struct scsi_data_buffer *sdb = NULL;
 +
 + if (blk_bidi_rq(req)) {
 + sdb = kzalloc(sizeof(*sdb), GFP_ATOMIC);
 + if (unlikely(!sdb))
 + return BLKPREP_DEFER;
 + }
  
   cmd = scsi_get_cmd_from_req(sdev, req);
 - if (unlikely(!cmd))
 + if (unlikely(!cmd)) {
 + kfree(sdb);
   return BLKPREP_DEFER;
 + }
 +
 + if (sdb)
 + req-next_rq-special = sdb;
  
   /*
* BLOCK_PC requests may transfer data, in which case they must

Before I get to my main concern here I have one comment. in 
scsi_get_cmd_from_req()
there is a code path in which a scsi_cmnd is taken from special and is not newly
allocated. It is best to move bidi allocation to scsi_get_cmd_from_req() and 
allocate
new sdb only if we get a new Command. (See my attached patch for an example)

OK! My main concern is with kzalloc(sizeof(*sdb), GFP_ATOMIC);
All IO allocations should come from slabs in case of a memory starved system.
There are 3 possible solutions I see:
1. Use struct scsi_cmnd for bidi_read and not a special scsi_data_buffer.
   Attached is above solution. It is by far the simplest of the three.
   So simple that even Jens's BigIO patch can almost apply at scsi_lib.c. (and 
vise versa)
   What's hanged on the request-next_rq-special is a second scsi_cmnd.
   The command is taken from regular command pool. This solution, though
   wasteful of some memory is very stable.

2. Force the users of BIDI to allocate scsi_data_buffer(s)
   Users will allocate a slab for scsi_data_buffers and hang them on 
nex_rq-special before
   hand. Than free them together with second request. This approach can be very 
stable, But
   it is bad because it is a layering violation. When block and SCSI layers 
decide to change
   the wiring of bidi. Users will have to change with them.

3. Let SCSI-ml manage a slab for scsi_data_buff's
   Have a flag to  scsi_setup_command_freelist() or a new API. When a device is 
flagged
   bidi-able we could 1-Allocate bigger slots to have extra memory for SDBs 
together
   with the command itself. or 2-allocate yet another slab for SDBs.

The 3rd approach is clean, and I can submit a patch for it. But I think it is 
not currently
necessary. The first solution (See attached patch) we can all live with, I 
hope. Since for
me it gives me the stability I need. And for the rest of the kernel it is the 
minimum
change, layered on existing building blocks.

If any one wants to try it out I put up the usual patchset that exercise this 
approach here.
http://www.bhalevy.com/open-osd/download/double_rq_bidi

Thanks
Boaz

From ac8f0d3df711c5d01a269fde6a4ecce3000c3f68 Mon Sep 17 00:00:00 2001
From: Boaz Harrosh [EMAIL PROTECTED]
Date: Tue, 8 May 2007 14:04:46 +0300
Subject: [PATCH] scsi-ml bidi support

At the block level bidi request uses req-next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_cmnd structure is used for the sglists
of 

Re: [PATCH v2] add bidi support for block pc requests

2007-05-08 Thread James Bottomley
On Tue, 2007-05-08 at 21:53 +0300, Boaz Harrosh wrote:
 Before I get to my main concern here I have one comment. in 
 scsi_get_cmd_from_req()
 there is a code path in which a scsi_cmnd is taken from special and is not 
 newly
 allocated. It is best to move bidi allocation to scsi_get_cmd_from_req() and 
 allocate
 new sdb only if we get a new Command. (See my attached patch for an example)
 
 OK! My main concern is with kzalloc(sizeof(*sdb), GFP_ATOMIC);
 All IO allocations should come from slabs in case of a memory starved system.

I think you'll find that kzalloc comes directly out of a slab for this
size of allocation anyway ... you mean you want to see a dedicated pool
for this specific allocation?

 There are 3 possible solutions I see:
 1. Use struct scsi_cmnd for bidi_read and not a special scsi_data_buffer.
Attached is above solution. It is by far the simplest of the three.
So simple that even Jens's BigIO patch can almost apply at scsi_lib.c. 
 (and vise versa)
What's hanged on the request-next_rq-special is a second scsi_cmnd.
The command is taken from regular command pool. This solution, though
wasteful of some memory is very stable.

There's another problem in that it destroys our forward progress
guarantee.  There's always a single reserve command for every HBA so
that forward progress for freeing memory can always be made in the
system even if the command slab is out and we have to reclaim memory
through a HBA with no outstanding commands.  Allocating two commands per
bidirectional request hoses that guarantee ... it could be fixed up by
increasing the reserve pool to 2, but that's adding further unwanted
complexity ...

 2. Force the users of BIDI to allocate scsi_data_buffer(s)
Users will allocate a slab for scsi_data_buffers and hang them on 
 nex_rq-special before
hand. Than free them together with second request. This approach can be 
 very stable, But
it is bad because it is a layering violation. When block and SCSI layers 
 decide to change
the wiring of bidi. Users will have to change with them.

I Agree.

 3. Let SCSI-ml manage a slab for scsi_data_buff's
Have a flag to  scsi_setup_command_freelist() or a new API. When a device 
 is flagged
bidi-able we could 1-Allocate bigger slots to have extra memory for SDBs 
 together
with the command itself. or 2-allocate yet another slab for SDBs.

if you're worried about allocations, doing a single allocate is better

 The 3rd approach is clean, and I can submit a patch for it. But I think it is 
 not currently
 necessary. The first solution (See attached patch) we can all live with, I 
 hope. Since for
 me it gives me the stability I need. And for the rest of the kernel it is the 
 minimum
 change, layered on existing building blocks.


There's actually a fourth option you haven't considered:

Roll all the required sglist definitions (request_bufflen,
request_buffer, use_sg and sglist_len) into the sgtable pools.

We're getting very close to the point where someone gets to sweep
through the drivers eliminating the now superfluous non-sg path in the
queuecommand.  When that happens the only cases become no transfer or SG
backed commands.  At this point we can do a consolidation of the struct
scsi_cmnd fields.  This does represent the ideal time to sweep the sg
list handling fields into the sgtable and simply have a single pointer
to struct sgtable in the scsi_cmnd (== NULL is the signal for a no
transfer command).

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] add bidi support for block pc requests

2007-05-07 Thread FUJITA Tomonori
Here is an updated version of the patch to add bidi support to block
pc requests. Bugs spotted by Benny were fixed.

This patch can be applied cleanly to the scsi-misc git tree and is on
the top of the following patch to add linked request support:

http://marc.info/?l=linux-scsim=117835587615642w=2

---
From: FUJITA Tomonori [EMAIL PROTECTED]
Date: Mon, 7 May 2007 16:42:24 +0900
Subject: [PATCH] add bidi support to scsi pc commands

This adds bidi support to block pc requests.

A bidi request uses req-next_rq pointer for an in request.

This patch introduces a new structure, scsi_data_buffer to hold the
data buffer information. To avoid make scsi_cmnd structure fatter, the
scsi mid-layer uses cmnd-request-next_rq-special pointer for
a scsi_data_buffer structure. LLDs don't touch the second request
(req-next_rq) so it's safe to use req-special.

scsi_blk_pc_done() always completes the whole command so
scsi_end_request() simply completes the bidi chunk too.

A helper function, scsi_bidi_data_buffer() is for LLDs to access to
the scsi_data_buffer structure easily.

Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
---
 drivers/scsi/scsi_lib.c  |  128 +++--
 include/scsi/scsi_cmnd.h |   14 +
 2 files changed, 125 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 61fbcdc..ba874a6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -66,6 +66,12 @@ static struct scsi_host_sg_pool scsi_sg_pools[] = {
 
 static void scsi_run_queue(struct request_queue *q);
 
+struct scsi_data_buffer *scsi_bidi_data_buffer(struct scsi_cmnd *cmd)
+{
+   return blk_bidi_rq(cmd-request) ? cmd-request-next_rq-special : 
NULL;
+}
+EXPORT_SYMBOL(scsi_bidi_data_buffer);
+
 /*
  * Function:   scsi_unprep_request()
  *
@@ -81,10 +87,14 @@ static void scsi_run_queue(struct request_queue *q);
 static void scsi_unprep_request(struct request *req)
 {
struct scsi_cmnd *cmd = req-special;
+   struct scsi_data_buffer *sdb = scsi_bidi_data_buffer(cmd);
 
req-cmd_flags = ~REQ_DONTPREP;
req-special = NULL;
-
+   if (sdb) {
+   kfree(sdb);
+   req-next_rq-special = NULL;
+   }
scsi_put_command(cmd);
 }
 
@@ -657,6 +667,7 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd 
*cmd, int uptodate,
request_queue_t *q = cmd-device-request_queue;
struct request *req = cmd-request;
unsigned long flags;
+   struct scsi_data_buffer *sdb = scsi_bidi_data_buffer(cmd);
 
/*
 * If there are blocks left over at the end, set up the command
@@ -685,6 +696,14 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd 
*cmd, int uptodate,
}
}
 
+   /*
+* a REQ_BLOCK_PC command is always completed fully so just do
+* end the bidi chunk.
+*/
+   if (sdb)
+   end_that_request_chunk(req-next_rq, uptodate,
+  sdb-request_bufflen);
+
add_disk_randomness(req-rq_disk);
 
spin_lock_irqsave(q-queue_lock, flags);
@@ -701,34 +720,35 @@ static struct scsi_cmnd *scsi_end_request(struct 
scsi_cmnd *cmd, int uptodate,
return NULL;
 }
 
-struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
+static struct scatterlist *do_scsi_alloc_sgtable(unsigned short use_sg,
+unsigned short *sglist_len,
+gfp_t gfp_mask)
 {
struct scsi_host_sg_pool *sgp;
-   struct scatterlist *sgl;
 
-   BUG_ON(!cmd-use_sg);
+   BUG_ON(!use_sg);
 
-   switch (cmd-use_sg) {
+   switch (use_sg) {
case 1 ... 8:
-   cmd-sglist_len = 0;
+   *sglist_len = 0;
break;
case 9 ... 16:
-   cmd-sglist_len = 1;
+   *sglist_len = 1;
break;
case 17 ... 32:
-   cmd-sglist_len = 2;
+   *sglist_len = 2;
break;
 #if (SCSI_MAX_PHYS_SEGMENTS  32)
case 33 ... 64:
-   cmd-sglist_len = 3;
+   *sglist_len = 3;
break;
 #if (SCSI_MAX_PHYS_SEGMENTS  64)
case 65 ... 128:
-   cmd-sglist_len = 4;
+   *sglist_len = 4;
break;
 #if (SCSI_MAX_PHYS_SEGMENTS   128)
case 129 ... 256:
-   cmd-sglist_len = 5;
+   *sglist_len = 5;
break;
 #endif
 #endif
@@ -737,11 +757,14 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd 
*cmd, gfp_t gfp_mask)
return NULL;
}
 
-   sgp = scsi_sg_pools + cmd-sglist_len;
-   sgl = mempool_alloc(sgp-pool, gfp_mask);
-   return sgl;
+   sgp = scsi_sg_pools + *sglist_len;
+   return mempool_alloc(sgp-pool, gfp_mask);
 }
 
+struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd,