Re: [PATCH, RFC] MAINTAINERS: update OSD entries

2017-05-03 Thread Benny Halevy
On Wed, May 3, 2017 at 1:08 PM, Boaz Harrosh <o...@electrozaur.com> wrote:
>
> On 05/02/2017 02:33 PM, Jeff Layton wrote:
> > On Tue, 2017-05-02 at 09:57 +0200, Christoph Hellwig wrote:
> >> The open-osd domain doesn't exist anymore, and mails to the list lead
> >> to really annoying bounced that repeat every day.
> >>
> >> Also the primarydata address for Benny bounces, and while I have a new
> >> one for him he doesn't seem to be maintaining the OSD code any more.
> >>
> >> Which beggs the question:  should we really leave the Supported status
> >> in MAINTAINERS given that the code is barely maintained?
> >>
> >> Signed-off-by: Christoph Hellwig <h...@lst.de>
> >> ---
> >>  MAINTAINERS | 4 
> >>  1 file changed, 4 deletions(-)
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 1bb06c5f7716..28dd83a1d9e2 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -9418,10 +9418,6 @@ F:drivers/net/wireless/intersil/orinoco/
> >>
> >>  OSD LIBRARY and FILESYSTEM
> >>  M:  Boaz Harrosh <o...@electrozaur.com>
> >> -M:  Benny Halevy <bhal...@primarydata.com>
> >> -L:  osd-...@open-osd.org
> >> -W:  http://open-osd.org
> >> -T:  git git://git.open-osd.org/open-osd.git
> >>  S:  Maintained
> >>  F:  drivers/scsi/osd/
> >>  F:  include/scsi/osd_*
> >
> > Hah, you beat me to it! I was going to spin up a patch for this today.
> >
> > Acked-by: Jeff Layton <jlay...@redhat.com>
>
> Acked-by: Boaz Harrosh <o...@electrozaur.com>

Acked-by: Benny Halevy <bhal...@gmail.com>

>
> >
>


Re: [osd-dev] [patch] [SCSI] libosd: remover duplicate __bitwise annotation

2013-04-17 Thread Benny Halevy
On 2013-04-17 10:19, Dan Carpenter wrote:
 __be32 is already a __bitwise type so we don't need the second __bitwise
 here.  It causes a Sparse error:
 include/scsi/osd_protocol.h:110:26: error: invalid modifier
 
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

ACK

 
 diff --git a/include/scsi/osd_protocol.h b/include/scsi/osd_protocol.h
 index a6026da..25ac628 100644
 --- a/include/scsi/osd_protocol.h
 +++ b/include/scsi/osd_protocol.h
 @@ -107,7 +107,7 @@ enum osd_attributes_mode {
   *   int exponent: 04;
   *   }
   */
 -typedef __be32 __bitwise osd_cdb_offset;
 +typedef __be32 osd_cdb_offset;
  
  enum {
   OSD_OFFSET_UNUSED = 0x,
 ___
 osd-dev mailing list
 osd-...@open-osd.org
 http://mailman.open-osd.org/mailman/listinfo/osd-dev
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] [SCSI] libosd: check for kzalloc() failure

2013-01-30 Thread Benny Halevy
On Wed, Jan 30, 2013 at 10:57 AM, walter harms wha...@bfs.de wrote:


 Am 30.01.2013 09:27, schrieb Dan Carpenter:
 On Wed, Jan 30, 2013 at 09:15:43AM +0100, walter harms wrote:


 Am 30.01.2013 08:06, schrieb Dan Carpenter:
 There wasn't any error handling for this kzalloc().

 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

 diff --git a/drivers/scsi/osd/osd_initiator.c 
 b/drivers/scsi/osd/osd_initiator.c
 index c06b8e5..d8293f2 100644
 --- a/drivers/scsi/osd/osd_initiator.c
 +++ b/drivers/scsi/osd/osd_initiator.c
 @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct osd_dev 
 *od,
 odi-osdname_len = get_attrs[a].len;
 /* Avoid NULL for memcmp optimization 0-length is good enough */
 odi-osdname = kzalloc(odi-osdname_len + 1, GFP_KERNEL);
 +   if (!odi-osdname) {
 +   ret = -ENOMEM;
 +   goto out;
 +   }
 if (odi-osdname_len)
 memcpy(odi-osdname, get_attrs[a].val_ptr, odi-osdname_len);
 OSD_INFO(OSD_NAME   [%s]\n, odi-osdname);
 --

 this looks like strdup() ?


 Maybe?  It's a funny thing going on with the NUL terminator and I
 don't understand what the comment is about.

 It appears that normally get_attrs[a].val_ptr is a NUL terminated
 string but get_attrs[a].len does not count the terminator.

 Odd.

 i have no clue what the programmer was thinking. if i read this correct
 osdname is u8 *osdname; so a simple strdup() or strndup() would be ok
 the comment seems to indicate that get_attrs[a].val_ptr could be NULL
 but where is the check ?
 Perhaps they are not using ascii here ? then a memdup(get_attrs[a].len)
 would be better.

There are subtle differences between kstrdup or kmemdup and this implementation.

kmemdup is problematic as get_attrs[a].val_ptr does not contain the
NUL terminator

In the following case:
if get_attrs[a].len == 0
then get_attrs[a].val_ptr == NULL

The end result should be
odi-osdname_len = 0
odi-osdname = kzalloc(1, GFP_KERNEL);

while with kstrdup, odi-osdname will end up being NULL
as it's input arg s is NULL.

Benny


 re,
  wh


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] [SCSI] libosd: check for kzalloc() failure

2013-01-30 Thread Benny Halevy
On Wed, Jan 30, 2013 at 9:06 AM, Dan Carpenter dan.carpen...@oracle.com wrote:
 There wasn't any error handling for this kzalloc().

 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

 diff --git a/drivers/scsi/osd/osd_initiator.c 
 b/drivers/scsi/osd/osd_initiator.c
 index c06b8e5..d8293f2 100644
 --- a/drivers/scsi/osd/osd_initiator.c
 +++ b/drivers/scsi/osd/osd_initiator.c
 @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct osd_dev *od,
 odi-osdname_len = get_attrs[a].len;
 /* Avoid NULL for memcmp optimization 0-length is good enough */
 odi-osdname = kzalloc(odi-osdname_len + 1, GFP_KERNEL);
 +   if (!odi-osdname) {
 +   ret = -ENOMEM;
 +   goto out;
 +   }

good catch!

Benny

 if (odi-osdname_len)
 memcpy(odi-osdname, get_attrs[a].val_ptr, odi-osdname_len);
 OSD_INFO(OSD_NAME   [%s]\n, odi-osdname);
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] [SCSI] libosd: check for kzalloc() failure

2013-01-30 Thread Benny Halevy
On Wed, Jan 30, 2013 at 3:00 PM, walter harms wha...@bfs.de wrote:


 Am 30.01.2013 10:51, schrieb Benny Halevy:
 On Wed, Jan 30, 2013 at 10:57 AM, walter harms wha...@bfs.de wrote:


 Am 30.01.2013 09:27, schrieb Dan Carpenter:
 On Wed, Jan 30, 2013 at 09:15:43AM +0100, walter harms wrote:


 Am 30.01.2013 08:06, schrieb Dan Carpenter:
 There wasn't any error handling for this kzalloc().

 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

 diff --git a/drivers/scsi/osd/osd_initiator.c 
 b/drivers/scsi/osd/osd_initiator.c
 index c06b8e5..d8293f2 100644
 --- a/drivers/scsi/osd/osd_initiator.c
 +++ b/drivers/scsi/osd/osd_initiator.c
 @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct 
 osd_dev *od,
 odi-osdname_len = get_attrs[a].len;
 /* Avoid NULL for memcmp optimization 0-length is good enough */
 odi-osdname = kzalloc(odi-osdname_len + 1, GFP_KERNEL);
 +   if (!odi-osdname) {
 +   ret = -ENOMEM;
 +   goto out;
 +   }
 if (odi-osdname_len)
 memcpy(odi-osdname, get_attrs[a].val_ptr, odi-osdname_len);
 OSD_INFO(OSD_NAME   [%s]\n, odi-osdname);
 --

 this looks like strdup() ?


 Maybe?  It's a funny thing going on with the NUL terminator and I
 don't understand what the comment is about.

 It appears that normally get_attrs[a].val_ptr is a NUL terminated
 string but get_attrs[a].len does not count the terminator.

 Odd.

 i have no clue what the programmer was thinking. if i read this correct
 osdname is u8 *osdname; so a simple strdup() or strndup() would be ok
 the comment seems to indicate that get_attrs[a].val_ptr could be NULL
 but where is the check ?
 Perhaps they are not using ascii here ? then a memdup(get_attrs[a].len)
 would be better.

 There are subtle differences between kstrdup or kmemdup and this 
 implementation.

 kmemdup is problematic as get_attrs[a].val_ptr does not contain the
 NUL terminator

 ok, i understand - but can we assume that we are talking ascii here ?


No, it can be anything.  UTF-8 is more likely but not guaranteed either.

 In the following case:
 if get_attrs[a].len == 0
 then get_attrs[a].val_ptr == NULL

 The end result should be
 odi-osdname_len = 0
 odi-osdname = kzalloc(1, GFP_KERNEL);

 while with kstrdup, odi-osdname will end up being NULL
 as it's input arg s is NULL.


 and you want the argument to be  ?
 May i ask why ? kfree() can handle NULL and kprintf() (-family) also.

It was Boaz' decision at the time to simplify internal tests
like _the_same_or_null in drivers/scsi/osd/osd_uld.c
that doesn't check for NULL

Nothing spectacular :)

Benny


 re,
  wh


 Benny


 re,
  wh




--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Revert IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs

2008-02-26 Thread Benny Halevy
Pete, the subject says PATCH 1/2 but I didn't see any follow-up message
for PATCH 2/2. Just wondering :)

Benny

On Feb. 26, 2008, 10:27 -0800, Pete Wyckoff [EMAIL PROTECTED] wrote:
 This reverts commit a3cd7d9070be417a21905c997ee32d756d999b38.
 
 The original commit breaks iSER reliably, making it complain:
 
 iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11
 
 The FMR cleanup thread runs ib_fmr_batch_release() as dirty
 entries build up.  This commit causes clean but used FMR
 entries also to be purged.  During that process, another thread
 can see that there are no free FMRs and fail, even though
 there should always have been enough available.
 
 Signed-off-by: Pete Wyckoff [EMAIL PROTECTED]
 ---
  drivers/infiniband/core/fmr_pool.c |   21 ++---
  1 files changed, 6 insertions(+), 15 deletions(-)
 
 diff --git a/drivers/infiniband/core/fmr_pool.c 
 b/drivers/infiniband/core/fmr_pool.c
 index 7f00347..4044fdf 100644
 --- a/drivers/infiniband/core/fmr_pool.c
 +++ b/drivers/infiniband/core/fmr_pool.c
 @@ -139,7 +139,7 @@ static inline struct ib_pool_fmr 
 *ib_fmr_cache_lookup(struct ib_fmr_pool *pool,
  static void ib_fmr_batch_release(struct ib_fmr_pool *pool)
  {
   int ret;
 - struct ib_pool_fmr *fmr, *next;
 + struct ib_pool_fmr *fmr;
   LIST_HEAD(unmap_list);
   LIST_HEAD(fmr_list);
  
 @@ -158,20 +158,6 @@ static void ib_fmr_batch_release(struct ib_fmr_pool 
 *pool)
  #endif
   }
  
 - /*
 -  * The free_list may hold FMRs that have been put there
 -  * because they haven't reached the max_remap count.
 -  * Invalidate their mapping as well.
 -  */
 - list_for_each_entry_safe(fmr, next, pool-free_list, list) {
 - if (fmr-remap_count == 0)
 - continue;
 - hlist_del_init(fmr-cache_node);
 - fmr-remap_count = 0;
 - list_add_tail(fmr-fmr-list, fmr_list);
 - list_move(fmr-list, unmap_list);
 - }
 -
   list_splice(pool-dirty_list, unmap_list);
   INIT_LIST_HEAD(pool-dirty_list);
   pool-dirty_len = 0;
 @@ -384,6 +370,11 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool)
  
   i = 0;
   list_for_each_entry_safe(fmr, tmp, pool-free_list, list) {
 + if (fmr-remap_count) {
 + INIT_LIST_HEAD(fmr_list);
 + list_add_tail(fmr-fmr-list, fmr_list);
 + ib_unmap_fmr(fmr_list);
 + }
   ib_dealloc_fmr(fmr-fmr);
   list_del(fmr-list);
   kfree(fmr);

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Revert IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs

2008-02-26 Thread Benny Halevy
Diabolical ;-)
Thanks for the pointer!

Benny

On Feb. 26, 2008, 11:39 -0800, Matthew Wilcox [EMAIL PROTECTED] wrote:
 On Tue, Feb 26, 2008 at 11:23:01AM -0800, Benny Halevy wrote:
 Pete, the subject says PATCH 1/2 but I didn't see any follow-up message
 for PATCH 2/2. Just wondering :)
 
 I think the problem's on your end ... I got it and so did marc:
 http://marc.info/?l=linux-scsim=120405067313933w=2
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_error: Fix language abuse.

2008-02-11 Thread Benny Halevy
Seriously, can't you just add a disclaimer to the README file?

In http://lkml.org/lkml/2008/2/9/29, Luben Tuikov made an interesting
point that in many cases illegal refers to a valid value that violates
the specification, so the term invalid may be technically incorrect.

Benny

On Feb. 11, 2008, 18:07 +0200, linux-os (Dick Johnson) [EMAIL PROTECTED] 
wrote:
 On Fri, 8 Feb 2008, Mark Hounschell wrote:
 
 linux-os (Dick Johnson) wrote:
 The correct word should be invalid, in spite of
 the fact that the SCSI committee used invalid syntax.

 Alan is right. There is nothing illegal in the kernel
 and if there is, it must be removed as soon as it
 is discovered!

 il·le·gal  (-lgl)
 adj.
 1. Prohibited by law.
 *2. Prohibited by official rules: an illegal pass in football.
 *3. Unacceptable to or not performable by a computer: an illegal operation.

 Mark
 
 Many early computer programmers including myself, while writing
 error messages in early software, did not understand that computer
 programmers do not make law so we called bad operations illegal.
 
 Once you are called to testify in a court of law, about some
 message your code wrote to the screen just before a factory
 burned down, you start to be much more careful about the syntax.
 
 I advise that, regardless of the rewrite of dictionaries and,
 in fact, the rewrite of history, whenever possible we use
 the correct message syntax. Dictionaries pick up common usage
 in spite of the fact that it is wrong. See irregardless and
 other abortions which now exist in the dictionary.
 
 
 Cheers,
 Dick Johnson
 Penguin : Linux version 2.6.22.1 on an i686 machine (5588.28 BogoMips).
 My book : http://www.AbominableFirebug.com/
 _
 
 
 The information transmitted in this message is confidential and may be 
 privileged.  Any review, retransmission, dissemination, or other use of this 
 information by persons or entities other than the intended recipient is 
 prohibited.  If you are not the intended recipient, please notify Analogic 
 Corporation immediately - by replying to this message or by sending an email 
 to [EMAIL PROTECTED] - and destroy all copies of this information, including 
 any attachments, without reading or disclosing them.
 
 Thank you.
 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-06 Thread Benny Halevy
On Feb. 06, 2008, 14:16 +0200, Bart Van Assche [EMAIL PROTECTED] wrote:
 On Feb 5, 2008 6:01 PM, Erez Zilber [EMAIL PROTECTED] wrote:
 Using such large values for FirstBurstLength will give you poor
 performance numbers for WRITE commands (with iSER). FirstBurstLength
 means how much data should you send as unsolicited data (i.e. without
 RDMA). It means that your WRITE commands were sent without RDMA.
 
 Sorry, but I'm afraid you got this wrong. When the iSER transport is
 used instead of TCP, all data is sent via RDMA, including unsolicited
 data. If you have look at the iSER implementation in the Linux kernel
 (source files under drivers/infiniband/ulp/iser), you will see that
 all data is transferred via RDMA and not via TCP/IP.

Regardless of what the current implementation is, the behavior you (Bart)
describe seems to disagree with http://www.ietf.org/rfc/rfc5046.txt.

Benny

 
 Bart Van Assche.
 -

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remove use_sg_chaining

2008-01-21 Thread Benny Halevy
On Jan. 21, 2008, 11:31 +0200, Jens Axboe [EMAIL PROTECTED] wrote:
 On Mon, Jan 21 2008, Boaz Harrosh wrote:
 On Sun, Jan 20 2008 at 22:59 +0200, James Bottomley [EMAIL PROTECTED] 
 wrote:
 On Sun, 2008-01-20 at 21:01 +0100, Jens Axboe wrote:
 On Sun, Jan 20 2008, Jens Axboe wrote:
 On Sun, Jan 20 2008, Boaz Harrosh wrote:
 On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote:
 On Sun, Jan 20 2008, James Bottomley wrote:
 On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote:
 On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL 
 PROTECTED] wrote:
 this patch depends on the sg branch of the block tree

 James

 ---
 From: James Bottomley [EMAIL PROTECTED]
 Date: Tue, 15 Jan 2008 11:11:46 -0600
 Subject: remove use_sg_chaining

 With the sg table code, every SCSI driver is now either chain capable
 or broken, so there's no need to have a check in the host template.

 Also tidy up the code by moving the scatterlist size defines into the
 SCSI includes and permit the last entry of the scatterlist pools not
 to be a power of two.
 ---
 I have a theoretical problem that BUGed me from the beginning.

 Could it happen that a memory critical IO, (that is needed to free
 memory), be collected into an sg-chained large IO, and the allocation 
 of the multiple sg-pool-allocations fail, thous dead locking on
 out-of-memory? Is there a mechanism in place that will split large 
 IO's 
 into smaller chunks in the event of out-of-memory condition in 
 prep_fn?

 Is it possible to call blk_rq_map_sg() with less then what is present
 at request to only map the starting portion?
 Obviously, that's why I was worrying about mempool size and default
 blocks a while ago.

 However, the deadlock only occurs if the device is swap or backing a
 filesystem with memory mapped files.  The use cases for this are really
 tapes and other entities that need huge buffers.  That's why we're
 keeping the system sector size at 1024 unless you alter it through 
 sysfs
 (here gun, there foot ...)
 Alternatively (and much safer, imho), we allow blk_rq_map_sg() return
 smaller than nr_phys_segments and just ensure that the request is
 continued nicely through the normal 'request if residual' logic.

 Thats a grate Idea. I will Q it on my todo list. Thanks
 ok good, thanks :-)
 btw, the above is full of typos, my apologies. it should read requeue
 if residual, but I guess you already guessed as much.
 Something like ...

 It looks to me like it would make sense to have something like a
 BLKPREP_SGALLOCFAIL return so the block layer can do this for us ...
 Alternatively, we'll have to find a way of adjusting the sector count as
 it goes into the ULD prep functions.

 James
 By luck this is no problem because it happens exactly before the ULD
 actually prepares the command. sd and sr are already doing these
 adjustments based on bufflen. For BLOCK_PC we will need to fail with
 perhaps a new BLKPREP_SGALLOCFAIL, like you said, and let the
 initiator take care of it.
 
 Right, the scsi_init_io() takes care of it and adjusts the buflen as
 needed, no need to pass this error back. As far as I'm concerned,
 blocking for BLOCK_PC requests should be fine (is anyone using these for
 swap?).
 

It could help the OSD I/O module which will produce BLOCK_PC bidi CDBs to get
feedback in the form of ENOMEM so to throttle down its I/O coalescing sizes
and generate smaller I/Os in face of memory pressure and then gradually throttle
up when on success.  If the requests are held back and blocked at the queue
I'm concerned that could hurt performance under memory pressure by not filling
up the pipeline as much as we can.

Benny
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] use dynamically allocated sense buffer

2008-01-16 Thread Benny Halevy
On Jan. 15, 2008, 17:20 +0200, James Bottomley [EMAIL PROTECTED] wrote:
 On Tue, 2008-01-15 at 18:23 +0900, FUJITA Tomonori wrote:
 This is the second version of

 http://marc.info/?l=linux-scsim=119933628210006w=2

 I gave up once, but I found that the performance loss is negligible
 (within 1%) by using kmem_cache_alloc instead of mempool.

 I use scsi_debug with fake_rw=1 and disktest (DIO reads with 8
 threads) again:

 scsi-misc (slub) | 486.9 MB/s  IOPS 124652.9/s
 dynamic sense buf (slub) | 483.2 MB/s  IOPS 123704.1/s

 scsi-misc (slab) | 467.0 MB/s  IOPS 119544.3/s
 dynamic sense buf (slab) | 468.7 MB/s  IOPS 119986.0/s

 The results are the averages of three runs with a server using two
 dual-core 1.60 GHz Xeon processors with DDR2 memory.


 I doubt think that someone will complain about the performance
 regression due to this patch. In addition, unlike scsi_debug, the real
 LLDs allocate the own data structure per scsi_cmnd so the performance
 differences would be smaller (and with the real hard disk overheads).

 Here's the full results:

 http://www.kernel.org/pub/linux/kernel/people/tomo/sense/results.txt
 
 Heh, that's one of those good news, bad news things.  Certainly good
 news for you.  The bad news for the rest of us is that you just
 implicated mempool in a performance problem  and since they're the core
 of the SCSI scatterlist allocations and sit at the heart of the critical
 path in SCSI, we have a potential performance issue in the whole of
 SCSI.

Looking at mempool's code this is peculiar as what seems to be its
critical path for alloc and free looks pretty harmless and lightweight.
Maybe an extra memory barrier, spin_{,un}lock_* and two extra function call
(one of them can be eliminated BTW if the order of arguments to the
mempool_{alloc,free}_t functions were the same as for kmem_cache_{alloc,free}).

Benny

 
 James

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] block layer varlen-cdb

2007-11-01 Thread Benny Halevy
On Nov. 01, 2007, 20:40 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote:
 On Thu, Nov 01, 2007 at 08:05:06PM +0200, Boaz Harrosh wrote:
 @@ -287,8 +287,13 @@ struct request {
  /*
   * when request is used as a packet command carrier
   */
 -unsigned int cmd_len;
 +unsigned short cmd_len;
 +unsigned short varlen_cdb_len;  /* length of varlen_cdb buffer */
  unsigned char cmd[BLK_MAX_CDB];
 +unsigned char *varlen_cdb;  /* an optional variable-length cdb.
 + * points to a user buffer that must be
 + * valid until end of request
 + */
  
 
 Try this instead:
 
   unsigned int cmd_len;
 - unsigned char cmd[BLK_MAX_CDB];
 + unsigned char _cmd[BLK_MAX_CDB];
 + unsigned char *cmd;
 
 Then initialise cmd to the address of _cmd.  If you need to override it
 later (ie patch 3), you can.
 

I agree this is probably the cleanest implementation but when Boaz and I
initially discussed this approach he convinced me that LL block devices assume
that req-cmd_len = BLK_MAX_CDB and it is unsafe at the moment to expose them
potentially larger commands.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] aic7xxx: Enable 16-bit CDBs

2007-10-22 Thread Benny Halevy
On Oct. 22, 2007, 21:35 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote:
 On Mon, Oct 22, 2007 at 09:10:49PM +0200, Boaz Harrosh wrote:
 I'm about to finish an RFC patchset for the extended commands.
 I have implemented a more aggressive approach than the one
 I've been sending for the last year.
 (Matthew I have an extra 8-bytes save to scsi_cmnd on 
 64bit and 12 bytes for 32bit. Guess how? ;))
 
 Well ... the command has to be stored somewhere.  If it's an additional
 kmalloc, that's a loss.  If it's in the request, that's a loss too ...
 let's see where you're keeping it ;-)
 
I love spoiling Boaz's surprise :)

The gist of it is scsi_cmnd-cmnd pointing at req-cmd
or at req-varlen_cmd as appropriate.

Most users can't tell the difference except if they
tried using sizeof(scsi_cmnd-cmnd) in which case they
can use MAX_COMMAND_SIZE instead.

Benny

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 27/32] scsi_data_buffer

2007-10-18 Thread Benny Halevy
On Oct. 18, 2007, 2:47 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote:
 On Wed, Oct 17, 2007 at 08:21:15PM +0200, Boaz Harrosh wrote:
   - Group all IO members of scsi_cmnd into a scsi_data_buffer
 structure.
 
 +struct scsi_data_buffer {
 +unsigned length;
 +int resid;
 +unsigned short sg_count;
 +unsigned short alloc_sg_count;
 +struct scatterlist* sglist;
 +};
 
 This has exactly the problems I thought it would have.  Due to alignment
 rules, it grows the scsi_cmnd from 368 to 376 bytes on x86-64.
 It remains at 272 bytes on i386 though, which is some consolation.
 By porting the patch I had for shrinking the scsi_cmnd, I can get it
 down to 352 bytes, but not as far as the 344 bytes I had it at before.
 
 The problem is the padding at the *end* of scsi_data_buffer (er, after I
 rearrange it in the patch below).  There's nothing we can realistically
 put in it, given that we don't want to expand scsi_data_buffer's size on
 32-bit machines.  I have a fix ... but I don't think you'll like it.
 I certainly don't.  But it does get us back down to 344 bytes.
 
 Updated patch below.  I'm fully expecting the 'result' shenanigan to get
 it NACKed, but I'd like to see if it inspires anyone else to a more
 creative way of saving this space.


yeah. The sglist pointer shuffle makes sense and so are the field type
changes and coalescing, but the union holding the deprecated fields
of scsi_data_buff is going away.

#pragma pack(4) before struct scsi_cmnd
(followed by #pragma pack()) should do the trick 
by saving member padding bytes. As a general rule I don't like pragma's
but this one works, is easy to understand, and is pretty standard.

 
 ---
 
 Thanks to acme's pahole utility, I found some places where we can save
 a lot of bytes in scsi_cmnd, just by rearranging struct elements and
 reducing the size of some elements.  We go from 272 to 260 bytes on x86
 and from 368 to 344 bytes on x86-64.
 
  - eh_eflags had a 4-byte hole after it on 64-bit.  In fact, this has
value 0 or 1, so reduce it to an unsigned char, and put it with the
other chars in scsi_cmnd.  Saves 8 bytes on 64-bit.
  - sc_data_direction has a value from 0-3, so make it an unsigned char
rather than an enum.  Saves at least 4 bytes.
  - Putting 'tag' with the other char elements saves another 4 bytes
  - Moving 'result' into the union with the deprecated members saves 8
bytes on 64-bit.
 
 diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
 index 047ffe6..2b10779 100644
 --- a/include/scsi/scsi_cmnd.h
 +++ b/include/scsi/scsi_cmnd.h
 @@ -13,22 +13,21 @@ struct Scsi_Host;
  struct scsi_device;
  
  struct scsi_data_buffer {
 + struct scatterlist* sglist;
   unsigned length;
   int resid;
   unsigned short sg_count;
   unsigned short alloc_sg_count;
 - struct scatterlist* sglist;
  };
  
  /* embedded in scsi_cmnd */
  struct scsi_pointer {
   char *ptr;  /* data pointer */
 - int this_residual;  /* left in this buffer */
   struct scatterlist *buffer; /* which buffer */
 + dma_addr_t dma_handle;
 + int this_residual;  /* left in this buffer */
   int buffers_residual;   /* how many buffers left */
  
 -dma_addr_t dma_handle;
 -
   volatile int Status;
   volatile int Message;
   volatile int have_data_in;
 @@ -40,7 +39,6 @@ struct scsi_cmnd {
   struct scsi_device *device;
   struct list_head list;  /* scsi_cmnd participates in queue lists */
   struct list_head eh_entry; /* entry for the host eh_cmd_q */
 - int eh_eflags;  /* Used by error handlr */
  
   /*
* A SCSI Command is assigned a nonzero serial_number before passed
 @@ -64,7 +62,9 @@ struct scsi_cmnd {
   int timeout_per_command;
  
   unsigned char cmd_len;
 - enum dma_data_direction sc_data_direction;
 + unsigned char eh_eflags;/* Used by error handler */
 + unsigned char sc_data_direction;/* enum dma_data_direction */
 + unsigned char tag;  /* SCSI-II queued command tag */
  
   /* These elements define the operation we are about to perform */
  #define MAX_COMMAND_SIZE 16
 @@ -109,10 +109,6 @@ struct scsi_cmnd {
* obtained by scsi_malloc is guaranteed
* to be at an address  16Mb). */
  
 - int result; /* Status code from lower level driver */
 -
 - unsigned char tag;  /* SCSI-II queued command tag */
 -
   union {
   struct scsi_data_buffer sdb;
   /*
 @@ -121,11 +117,12 @@ struct scsi_cmnd {
*of struct scsi_data_buffer members.
*/
   struct {
 + void __deprecated *request_buffer;
   unsigned __deprecated request_bufflen;
   int __deprecated resid;
   unsigned short __deprecated use_sg;
 

Re: [PATCH 27/32] scsi_data_buffer

2007-10-18 Thread Benny Halevy
On Oct. 18, 2007, 10:06 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote:
 On Thu, Oct 18, 2007 at 08:59:58AM +0200, Benny Halevy wrote:
 yeah. The sglist pointer shuffle makes sense and so are the field type
 changes and coalescing, but the union holding the deprecated fields
 of scsi_data_buff is going away.
 
 Indeed.  We could always do ...
 
   union {
   struct scsi_data_pointer;
   struct {
   char _padding_[sizeof(struct scsi_data_pointer)];
   int result;
   }
   }
 
 But that might set a new record of depths of obfuscation.

agreed :)

 
 #pragma pack(4) before struct scsi_cmnd
 (followed by #pragma pack()) should do the trick 
 by saving member padding bytes. As a general rule I don't like pragma's
 but this one works, is easy to understand, and is pretty standard.
 
 But it might well have the effect of pessimising accesses to the struct,
 and accesses to scsi_cmnd are pretty hot.  I've seen gcc revert to byte
 loads and stores to packed structs in the past.  Maybe it's got better
 now.
 

that's why I suggested pack(4) and not pack(1), assuming 4-byte alignment
is still as efficient as 8-byte alignment on most common 64 bit architectures.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86-64: pci-gart iommu sg chaining zeroes wrong sg.

2007-09-30 Thread Benny Halevy
On Sep 27, 2007, 18:46 +0200, FUJITA Tomonori [EMAIL PROTECTED] wrote:
 On Fri, 28 Sep 2007 01:38:27 +0900
 FUJITA Tomonori [EMAIL PROTECTED] wrote:
 
 This patch is for Jens' block tree (sg chaining branch).

 I don't have the hardware but this looks like a bug.

 ---
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Subject: [PATCH] x86-64: pci-gart iommu sg chaining zeroes a wrong sg's 
 dma_length

 Needs to zero the end of the list.

 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 ---
  arch/x86_64/kernel/pci-gart.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)

 diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
 index 27b7db4..a4151a7 100644
 --- a/arch/x86_64/kernel/pci-gart.c
 +++ b/arch/x86_64/kernel/pci-gart.c
 @@ -425,9 +425,10 @@ int gart_map_sg(struct device *dev, struct scatterlist 
 *sg, int nents, int dir)
  if (dma_map_cont(start_sg, i - start, sgmap, pages, need)  0)
  goto error;
  out++;
 +sgmap = sg_next(sgmap);
  flush_gart();
  if (out  nents)
 -ps-dma_length = 0;
 +sgmap-dma_length = 0;
  return out;
 
 Sorry, it should be:
 
 
 diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
 index 27b7db4..cfcc84e 100644
 --- a/arch/x86_64/kernel/pci-gart.c
 +++ b/arch/x86_64/kernel/pci-gart.c
 @@ -426,8 +426,10 @@ int gart_map_sg(struct device *dev, struct scatterlist 
 *sg, int nents, int dir)
   goto error;
   out++;
   flush_gart();
 - if (out  nents)
 - ps-dma_length = 0;
 + if (out  nents) {
 + sgmap = sg_next(sgmap);
 + sgmap-dma_length = 0;
 + }

looks correct to me.
ps points at the previous scanned sg entry while you want to zero out
dma_length at the entry immediately following the last entry mapped
(if (out  nents))

the original code before 62296749bd421904dace1e6b0fc3c4538aac7111 was:
-   if (out  nents) 
-   sg[out].dma_length = 0; 

Benny

   return out;
  
  error:
 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH ver2 5/5] arm: fas216 Use scsi_eh API for REQUEST_SENSE invocation

2007-09-12 Thread Benny Halevy

Russell King wrote:

On Mon, Sep 10, 2007 at 10:39:11PM +0300, Boaz Harrosh wrote:
  

  - Use new scsi_eh_prep/restor_cmnd() for synchronous
REQUEST_SENSE invocation.

Signed-off-by: Boaz Harrosh [EMAIL PROTECTED]
---
 drivers/scsi/arm/fas216.c |   16 +++-
 drivers/scsi/arm/fas216.h |3 +++
 2 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/arm/fas216.c b/drivers/scsi/arm/fas216.c
index fb5f202..a715632 100644
--- a/drivers/scsi/arm/fas216.c
+++ b/drivers/scsi/arm/fas216.c
@@ -2018,6 +2018,7 @@ static void fas216_rq_sns_done(FAS216_Info *info, struct 
scsi_cmnd *SCpnt,
 * the upper layers to process.  This would have been set
 * correctly by fas216_std_done.
 */
+   scsi_eh_restore_cmnd(SCpnt, info-ses);
SCpnt-scsi_done(SCpnt);
 }
 
@@ -2103,23 +2104,12 @@ request_sense:

if (SCpnt-cmnd[0] == REQUEST_SENSE)
goto done;
 
+	scsi_eh_prep_cmnd(SCpnt, info-ses, NULL, 0, ~0);

fas216_log_target(info, LOG_CONNECT, SCpnt-device-id,
  requesting sense);
-   memset(SCpnt-cmnd, 0, sizeof (SCpnt-cmnd));
-   SCpnt-cmnd[0] = REQUEST_SENSE;
-   SCpnt-cmnd[1] = SCpnt-device-lun  5;
-   SCpnt-cmnd[4] = sizeof(SCpnt-sense_buffer);
-   SCpnt-cmd_len = COMMAND_SIZE(SCpnt-cmnd[0]);
-   SCpnt-SCp.buffer = NULL;
-   SCpnt-SCp.buffers_residual = 0;
-   SCpnt-SCp.ptr = (char *)SCpnt-sense_buffer;
-   SCpnt-SCp.this_residual = sizeof(SCpnt-sense_buffer);
-   SCpnt-SCp.phase = sizeof(SCpnt-sense_buffer);
+   init_SCp(SCpnt);
SCpnt-SCp.Message = 0;
SCpnt-SCp.Status = 0;
-   SCpnt-request_bufflen = sizeof(SCpnt-sense_buffer);
-   SCpnt-sc_data_direction = DMA_FROM_DEVICE;
-   SCpnt-use_sg = 0;
SCpnt-tag = 0;
SCpnt-host_scribble = (void *)fas216_rq_sns_done;



So where do we end up setting up the request sense command?

  

In scsi_eh_prep_cmnd(), when (sense_bytes != 0), called here:

+   scsi_eh_prep_cmnd(SCpnt, info-ses, NULL, 0, ~0);

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sg: increase sglist_len of the sg_scatter_hold structure

2007-08-09 Thread Benny Halevy
Mike Christie wrote:
 Mike Christie wrote:
 For drivers like sg and st, do mean the the sg list that is passed to 
 functions like scsi_execute_async? If we kill that argument, and instead 
 have sg.c and other scsi_execute_async callers just call blk helpers 
 like blk_rq_map_user then we would not have to worry about drivers like 
 sg needing to know about chaining right? I mean sg.c would not every 
 interact with a scatterlist. It would just interact with a request and 
 the blk helpers map data for it.
 
 There should be a return there.
 
   The scatterlist that sg and st interact
 with is bogus. It gets thrown away in scsi_execute_async and is only 
 used for book keeping.
 
 I mean currently the scatterlist that sg and st use is bogus and gets 
 thrown away. If we convert sg and st to use blk_rq_map_user then those 
 drivers will not have to interact with a scatterlist at all.

I'm not familiar with the dire details but this sounds like a good idea.

Benny

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 7/8] sd.c and sr.c move to scsi_sgtable implementation

2007-07-29 Thread Benny Halevy
FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: [RFC 7/8] sd.c and sr.c move to scsi_sgtable implementation
 Date: Thu, 05 Jul 2007 16:44:04 +0300
 
   - sd and sr would adjust IO size to align on device's block
 size. So code needs to change once we move to scsi_sgtable
 implementation. (Not compatible with un-converted drivers)
   - Convert code to use scsi_for_each_sg
   - Use Data accessors wherever is appropriate.

  Signed-off-by: Boaz Harrosh [EMAIL PROTECTED]
 ---
  drivers/scsi/sd.c |   10 --
  drivers/scsi/sr.c |   27 ---
  2 files changed, 16 insertions(+), 21 deletions(-)

 diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
 index 448d316..459fd23 100644
 --- a/drivers/scsi/sd.c
 +++ b/drivers/scsi/sd.c
 @@ -338,7 +338,7 @@ static int sd_init_command(struct scsi_cmnd * SCpnt)
  struct request *rq = SCpnt-request;
  struct gendisk *disk = rq-rq_disk;
  sector_t block = rq-sector;
 -unsigned int this_count = SCpnt-request_bufflen  9;
 +unsigned int this_count = scsi_bufflen(SCpnt)  9;
  unsigned int timeout = sdp-timeout;
  
  SCSI_LOG_HLQUEUE(1, scmd_printk(KERN_INFO, SCpnt,
 @@ -418,9 +418,6 @@ static int sd_init_command(struct scsi_cmnd * SCpnt)
  } else if (rq_data_dir(rq) == READ) {
  SCpnt-cmnd[0] = READ_6;
  SCpnt-sc_data_direction = DMA_FROM_DEVICE;
 -} else {
 -scmd_printk(KERN_ERR, SCpnt, Unknown command %x\n, 
 rq-cmd_flags);
 -return 0;
  }
 
 Why?

This seems to be dead code as rq_data_dir can only be either READ or WRITE.
The else case could be a BUG() but I don't see why it should be handled
gracefully.

 
 
  SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt,
 @@ -480,7 +477,8 @@ static int sd_init_command(struct scsi_cmnd * SCpnt)
  SCpnt-cmnd[4] = (unsigned char) this_count;
  SCpnt-cmnd[5] = 0;
  }
 -SCpnt-request_bufflen = this_count * sdp-sector_size;
 +BUG_ON(!SCpnt-sgtable);
 +SCpnt-sgtable-length = this_count * sdp-sector_size;
  
  /*
   * We shouldn't disconnect in the middle of a sector, so with a dumb
 @@ -892,7 +890,7 @@ static struct block_device_operations sd_fops = {
  static void sd_rw_intr(struct scsi_cmnd * SCpnt)
  {
  int result = SCpnt-result;
 -unsigned int xfer_size = SCpnt-request_bufflen;
 +unsigned int xfer_size = scsi_bufflen(SCpnt);
  unsigned int good_bytes = result ? 0 : xfer_size;
  u64 start_lba = SCpnt-request-sector;
  u64 bad_lba;
 diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
 index f9a52af..ed61ca9 100644
 --- a/drivers/scsi/sr.c
 +++ b/drivers/scsi/sr.c
 @@ -218,7 +218,7 @@ int sr_media_change(struct cdrom_device_info *cdi, int 
 slot)
  static void rw_intr(struct scsi_cmnd * SCpnt)
  {
  int result = SCpnt-result;
 -int this_count = SCpnt-request_bufflen;
 +int this_count = scsi_bufflen(SCpnt);
  int good_bytes = (result == 0 ? this_count : 0);
  int block_sectors = 0;
  long error_sector;
 @@ -345,23 +345,20 @@ static int sr_init_command(struct scsi_cmnd * SCpnt)
  } else if (rq_data_dir(SCpnt-request) == READ) {
  SCpnt-cmnd[0] = READ_10;
  SCpnt-sc_data_direction = DMA_FROM_DEVICE;
 -} else {
 -blk_dump_rq_flags(SCpnt-request, Unknown sr command);
 -return 0;
  }
 
 ditto.

ditto :)

 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining

2007-07-25 Thread Benny Halevy
James Bottomley wrote:
 On Tue, 2007-07-24 at 17:01 +0300, Benny Halevy wrote:
 FUJITA Tomonori wrote:
 I should have said that, was the approach to use separate buffer for
 sglists instead of putting the sglists and the parameters in one
 buffer completely rejected?
 I think that James should be asked this question.
 My understanding was that he preferred allocating the sgtable
 header along with the scatterlist array.
 
 All I really cared about was insulating the drivers from future changes
 in this area.  It strikes me that for chained sglist implementations,
 this can all become a block layer responsibility, since more than SCSI
 will want to make use of it.

agreed.

 
 Just remember, though that whatever is picked has to work in both memory
 constrained embedded systems as well as high end clusters.  It seems to
 me (but this isn't a mandate) that a single tunable sg element chunk
 size will accomplish this the best (as in get rid of the entire SCSI
 sglist sizing machinery) .

maybe :)
I'm not as familiar as you are with all the different uses of linux.
IMO, having a tunable is worse for the administrator and I'd prefer
an automatic mechanism that will dynamically adapt the allocation
size(s) to the actual use, very much like the one you have today.

 
 However, I'm perfectly happy to go with whatever the empirical evidence
 says is best .. and hopefully, now we don't have to pick this once and
 for all time ... we can alter it if whatever is chosen proves to be
 suboptimal.

I agree.  This isn't a catholic marriage :)
We'll run some performance experiments comparing the sgtable chaining
implementation vs. a scsi_data_buff implementation pointing
at a possibly chained sglist and let's see if we can measure
any difference.  We'll send results as soon as we have them.

 
 There are pro's and con's either way.  In my opinion separating
 the headers is better for mapping buffers that have a power of 2
 #pages (which seems to be the typical case) since when you're
 losing one entry in the sgtable for the header you'd waste a lot
 more when you just cross the bucket boundary. E.g. for 64 pages
 you need to allocate from the 64 to 127 bucket rather than the
 33 to 64 bucket).  Separated, one sgtable header structure
 can just be embedded in struct scsi_cmnd for uni-directional transfers
 (wasting some space when transferring no data, but saving space and
 cycles in the common case vs. allocating it from a separate memory pool)
 and the one for bidi read buffers can be allocated separately just for
 bidi commands.
 
 This is all opinion ... could someone actually run some performance
 tests?
 
 James
 
 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enable 16-bit CDBs for aic7xxx/aic79xxx

2007-07-24 Thread Benny Halevy
Just wondering, have you tried testing it with our patches to support
long cdb?  If not, it would be great if you could try doing that.
A snapshot of them is in 
http://www.bhalevy.com/open-osd/download/sgtable_bidi_varlen/

Benny

Hannes Reinecke wrote:
 Hannes Reinecke wrote:
 Hi James,

 this patch enables 16-bit CDBs for aic7xxx and aic79xx. aic7xxx actuallys
 supports up to 32-bit CDBs, so it might be that aic79xx does that, too.
 But this would include some more hacking, so this is way easier.

 Yeah, grand. That should read '16-byte CDBs', of course.
 
 But at least I've been consistent :-)
 New patch attached.
 
 Cheers,
 
 Hannes
 
 
 
 
 Enable 16-byte CDBs for aic7xxx/aix79xx
 
 The patch enables support for 16-byte CDBs in aic7xxx and aic79xx.
 aic7xxx can actually support up to 32-byte CDBs, should they ever see
 the light of day.
 
 Signed-off-by: Hannes Reinecke [EMAIL PROTECTED]
 
 diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c 
 b/drivers/scsi/aic7xxx/aic79xx_osm.c
 index 286ab83..8502085 100644
 --- a/drivers/scsi/aic7xxx/aic79xx_osm.c
 +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
 @@ -1089,6 +1089,7 @@ ahd_linux_register_host(struct ahd_softc *ahd, struct 
 scsi_host_template *templa
   host-max_id = (ahd-features  AHD_WIDE) ? 16 : 8;
   host-max_lun = AHD_NUM_LUNS;
   host-max_channel = 0;
 + host-max_cmd_len = MAX_CDB_LEN;
   host-sg_tablesize = AHD_NSEG;
   ahd_lock(ahd, s);
   ahd_set_unit(ahd, ahd_linux_unit++);
 diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c 
 b/drivers/scsi/aic7xxx/aic7xxx_osm.c
 index 1803ab6..a6b3071 100644
 --- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
 +++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
 @@ -1047,6 +1047,7 @@ ahc_linux_register_host(struct ahc_softc *ahc, struct 
 scsi_host_template *templa
   host-max_id = (ahc-features  AHC_WIDE) ? 16 : 8;
   host-max_lun = AHC_NUM_LUNS;
   host-max_channel = (ahc-features  AHC_TWIN) ? 1 : 0;
 + host-max_cmd_len = 32;
   host-sg_tablesize = AHC_NSEG;
   ahc_lock(ahc, s);
   ahc_set_unit(ahc, ahc_linux_unit++);

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining

2007-07-24 Thread Benny Halevy
FUJITA Tomonori wrote:
 I should have said that, was the approach to use separate buffer for
 sglists instead of putting the sglists and the parameters in one
 buffer completely rejected?

I think that James should be asked this question.
My understanding was that he preferred allocating the sgtable
header along with the scatterlist array.

There are pro's and con's either way.  In my opinion separating
the headers is better for mapping buffers that have a power of 2
#pages (which seems to be the typical case) since when you're
losing one entry in the sgtable for the header you'd waste a lot
more when you just cross the bucket boundary. E.g. for 64 pages
you need to allocate from the 64 to 127 bucket rather than the
33 to 64 bucket).  Separated, one sgtable header structure
can just be embedded in struct scsi_cmnd for uni-directional transfers
(wasting some space when transferring no data, but saving space and
cycles in the common case vs. allocating it from a separate memory pool)
and the one for bidi read buffers can be allocated separately just for
bidi commands.

Benny
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation

2007-07-18 Thread Benny Halevy
Jens Axboe wrote:
 On Wed, Jul 18 2007, Boaz Harrosh wrote:
 Jens Axboe wrote:
 On Wed, Jul 18 2007, Boaz Harrosh wrote:
 FUJITA Tomonori wrote:
 From: Mike Christie [EMAIL PROTECTED]
 Subject: Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation
 Date: Thu, 12 Jul 2007 14:09:44 -0500

 Boaz Harrosh wrote:
 +/*
 + * Should fit within a single page.
 + */
 +enum { SCSI_MAX_SG_SEGMENTS =
 +   ((PAGE_SIZE - sizeof(struct scsi_sgtable)) /
 +   sizeof(struct scatterlist)) };
 +
 +enum { SG_MEMPOOL_NR =
 +   (SCSI_MAX_SG_SEGMENTS = 7) +
 +   (SCSI_MAX_SG_SEGMENTS = 15) +
 +   (SCSI_MAX_SG_SEGMENTS = 31) +
 +   (SCSI_MAX_SG_SEGMENTS = 63) +
 +   (SCSI_MAX_SG_SEGMENTS = 127) +
 +   (SCSI_MAX_SG_SEGMENTS = 255) +
 +   (SCSI_MAX_SG_SEGMENTS = 511)
 +};
  
 What does SCSI_MAX_SG_SEGMENTS end up being on x86 now? On x86_64 or 
 some other arch, we were going over a page when doing 
 SCSI_MAX_PHYS_SEGMENTS of 256 right?
 Seems that 170 with x86 and 127 with x86_64.

 with scsi_sgtable we get one less than now

 Arch  | SCSI_MAX_SG_SEGMENTS =  | sizeof(struct 
 scatterlist)
 --|-|---
 x86_64| 127 |32
 i386 CONFIG_HIGHMEM64G=y  | 204 |20
 i386 other| 255 |16

 What's nice about this code is that now finally it is
 automatically calculated in compile time. Arch people
 don't have the headache did I break SCSI-ml?. 
 For example observe the current bug with i386 
 CONFIG_HIGHMEM64G=y.

 The same should be done with BIO's. Than ARCHs with big
 pages can gain even more.

 What happened to Jens's scatter list chaining and how does this relate 
 to it then?
 With Jens' sglist, we can set SCSI_MAX_SG_SEGMENTS to whatever we
 want. We can remove the above code.

 We need to push this and Jens' sglist together in one merge window, I
 think.
 No Tomo the above does not go away. What goes away is maybe:
 It does go away, since we can just set it to some safe value and use
 chaining to get us where we want.
 In my patches SCSI_MAX_PHYS_SEGMENTS has went away it does not exist
 anymore.
 
 Sure, I could just kill it as well. The point is that it's a parallel
 development, there's nothing in your patch that helps the sg chaining
 whatsoever. The only complex thing in the SCSI layer for sg chaining,
 is chaining when allocating and walking that chain on freeing. That's
 it!

It seems like having the pool index in the sgtable structure simplifies
the implementation a bit for allocation and freeing of linked sgtables.
Boaz will send an example tomorrow (hopefully) showing how the merged
code looks like.

 
 The new SCSI_MAX_SG_SEGMENTS (your name by the way) is right there
 and is calculated to maximize allocation in one page.
 
 Yes, it's still a good idea to make sure you get good packing in the
 page, it's of course cheaper to have less chaining. But it's not
 critical in the same way as before, when it could impact IO layout and
 performance.
 
 (I guess the right name is SCSI_MAX_PHYS_SEGMENTS_IN_A_PAGE)
 which will be needed in both your patches and mine.
 
 That would be better name.
 
blk_queue_max_hw_segments(q, shost-sg_tablesize);
 -  blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
blk_queue_max_sectors(q, shost-max_sectors);

 The reporting above is not needed and can be what ever block layer
 considers safe/optimized.
 
 You still need to set it, you can't just ignore it. Whether you redefine
 SCSI_MAX_SG_SEGMENTS or use (unsigned short) -1 doesn't really matter a
 whole lot.
 
 I'm working on a convergence patches that will do scsi_sg_pools cleanup
 which is common to both our patches, than scsi_sgtable, and than 
 sg-chaining on top of that. I hope it gets accepted. 
 The sg-chaining is much much simpler over scsi_sgtables.
 Sorry, I don't follow this paragraph at all. What is the scsi_sgtables
 change you are referring to? And how does it make sg chaining so much
 simpler?

 I guess my problem is that I don't know what problem this scsi_sgtables
 you refer to is fixing?

 scsi_sgtable is a solution proposed by James Bottomley where all I/O
 members of struct scsi_cmnd and the resid member, which need to be
 duplicated for bidirectional transfers, can be allocated together with
 the sg-list they are pointing to. This way when bidi comes, the all
 structure can be duplicated with minimal change to code, and with no
 extra baggage when bidi is not used.  This was the all motivation for
 the data accessors and cleanup, swiping through the entire scsi tree.
 So when implementation changes drivers do not change with them. Now
 meanwhile moving over drivers code we (well Tomo mostly) removed the
 old !use_sg code path, and also abstracted the 2 major hot spots of
 above usages with scsi_dma_{un,}map, and the scsi_for_each_sg.
 Actually that one was changed from the original definition to match
 

Re: [PATCH v2] add bidi support for block pc requests

2007-05-17 Thread Benny Halevy
FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 17 May 2007 11:49:37 +0300
 
 FUJITA Tomonori wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Thu, 17 May 2007 07:48:13 +0200

 On Thu, May 17 2007, FUJITA Tomonori wrote:
 From: Jens Axboe [EMAIL PROTECTED]
 Subject: Re: [PATCH v2] add bidi support for block pc requests
 Date: Wed, 16 May 2007 19:53:22 +0200

 On Wed, May 16 2007, Boaz Harrosh wrote:
 now there are 2 issues with this:
 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get
requests for use_sg=128 which will crash the kernel.
 That sounds like a serious issue, it should definitely not happen. Stuff
 like that would bite other drivers as well, are you absolutely sure that
 is happening? Power-of-2 bug in your code, or in the SCSI code?
 Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg?
 These are regular fs (ext3) requests during bootup. The machine will not
 boot. (Usually from the read ahead code)
 Don't believe me look at the second patch Over Tomo's cleanup.
 If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I
 did in code:
  blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS);
 I suppose someone is looking at a different definition. Or there is
 another call I need to do for this to work.
 
 I modified your patch a bit (sgtable allocation) and set
 SCSI_MAX_SG_SEGMENTS to 127. My ppc64 box seems to work (with
 iscsi_tcp and ipr drivers).
 
 iscsi_tcp sets sg_tablesize to 255, but blk_queue_max_phys_segments
 seems to work.
 
 One thing that I found is:
 
 +#define scsi_resid(cmd) ((cmd)-sg_table-resid)
 
 
 This doesn't work for some drivers (at least ipr) since they set
 cmd-resid even with commands without data transfer.

Hmm, since we need a residual count also for the bidi_in transfer
this problem is another reason for having the scsi_cmd_buff in struct
scsi_cmnd, allocate another one from a pool for the bidi case,
and point to the sglist in both cases rather than having a sg_table
header allocated along with the sg list.
Alternatively, having a pool for the no-data case might be another
possible solution, though it seems a bit odd to me.

snip

 -void scsi_free_sgtable(struct scatterlist *sgl, int index)
 +static struct scsi_sg_table *scsi_alloc_sgtable(int nseg, gfp_t gfp_mask)
  {
   struct scsi_host_sg_pool *sgp;
 + struct scsi_sg_table *sgt;
 + unsigned int idx;
  
 - BUG_ON(index = SG_MEMPOOL_NR);
 + for (idx = 0; idx  SG_MEMPOOL_NR; idx++)
 + if (scsi_sg_pools[idx].size = nseg)
 + goto found;

Tomo, I prefer the loop to be based on calculation as follows rather
than scanning the scsi_sg_pools table in order to minimize memory access
(and thrashing of the cpu data cache - each scsi_host_sg_pool takes a cache
row on x86_64)

+   for (i = 0, size = 8; i  SG_MEMPOOL_NR-1; i++, size = 1)
+   if (size  nents)
+   return i;

Benny
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/19] add data buffer accessors

2007-05-14 Thread Benny Halevy
FUJITA Tomonori wrote:
 +#define scsi_for_each_sg(cmd, sg, nseg, __i) \
 + for (__i = 0, sg = scsi_sglist(cmd); __i  (nseg); __i++, (sg)++)
 +

This feels like a layering violation, why not use for_each_sg()?

+#define scsi_for_each_sg(cmd, sg, nseg, __i)   \
for_each_sg(scsi_sglist(cmd), (sg), (nseg), (__i))  \

That said, I'm not sure that scsi_for_each_sg() is worth abstracting
since the caller can just as well do for_each_sg() directly
as sketched above...

Benny
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/19] add data buffer accessors

2007-05-14 Thread Benny Halevy
FUJITA Tomonori wrote:
 From: Benny Halevy [EMAIL PROTECTED]
 Subject: Re: [PATCH 1/19] add data buffer accessors
 Date: Mon, 14 May 2007 10:57:08 +0300
 
 FUJITA Tomonori wrote:
 +#define scsi_for_each_sg(cmd, sg, nseg, __i)   \
 +   for (__i = 0, sg = scsi_sglist(cmd); __i  (nseg); __i++, (sg)++)
 +
 This feels like a layering violation, why not use for_each_sg()?

 +#define scsi_for_each_sg(cmd, sg, nseg, __i)\
  for_each_sg(scsi_sglist(cmd), (sg), (nseg), (__i))  \
 
 As I said before, when for_each_sg is ready, we'll convert
 scsi_for_each_sg to use for_each_sg.

thanks. works for me.

 
 
 That said, I'm not sure that scsi_for_each_sg() is worth abstracting
 since the caller can just as well do for_each_sg() directly
 as sketched above...
 
 I'm not sure why you think it's a layering violation.

I'd like to think of struct scatterlist as an abstract data type
with its own traversal method that hides its internals.
Not a layer per-se but more of an abstraction...

 
 With scsi_for_each_sg(), many drivers don't need scsi_sglist().

Sure, just my two cents...

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] bidi support

2007-05-07 Thread Benny Halevy
Tomo,

Thanks for quickly crafting this patchset.
Please see my comments in-line below.

Putting aside the controversial design issues,
we need to carefully compare our patches against yours
to make sure no functional issues have been overlooked.

Benny

FUJITA Tomonori wrote:
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Subject: [PATCH 0/2] bidi support
 Date: Sat, 05 May 2007 18:02:29 +0900
 
 This patchset add bidi support for block pc requests.
 
 Oh, this patchset is against Linus' tree.
 
 The first patch (the block layer changes) can be applied against Jens'
 tree.
 
 The second patch (the scsi mid-layer changes) has one minor reject
 against the scsi-misc tree. The following patch is against the
 scsi-misc.
 
 ---
 From 6a8c5375f1f6dbd574107920dd0a613527bd556b Mon Sep 17 00:00:00 2001
 From: FUJITA Tomonori [EMAIL PROTECTED]
 Date: Sat, 5 May 2007 18:11:42 +0900
 Subject: [PATCH] add bidi support for block pc requests
 
 This adds bidi support for block pc requests.
 
 A bidi request uses req-next_rq pointer for an in request.
 
 This patch introduces a new structure, scsi_data_buffer to hold the
 data buffer information. To avoid make scsi_cmnd structure fatter, the
 scsi mid-layer uses cmnd-request-next_rq-special pointer for
 a scsi_data_buffer structure. LLDs don't touch the second request
 (req-next_rq) so it's safe to use req-special.
 
 scsi_blk_pc_done() always completes the whole command so
 scsi_end_request() simply completes the bidi chunk too.
 
 A helper function, scsi_bidi_data_buffer() is for LLDs to access to
 the scsi_data_buffer structure easily.
 
 Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED]
 ---
  drivers/scsi/scsi_lib.c  |  120 +++--
  include/scsi/scsi_cmnd.h |   14 +
  2 files changed, 118 insertions(+), 16 deletions(-)
 
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index be8e655..8f7873a 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -66,6 +66,12 @@ #undef SP
  
  static void scsi_run_queue(struct request_queue *q);
  
 +struct scsi_data_buffer *scsi_bidi_data_buffer(struct scsi_cmnd *cmd)
 +{
 + return blk_bidi_rq(cmd-request) ? cmd-request-next_rq-special : 
 NULL;
 +}
 +EXPORT_SYMBOL(scsi_bidi_data_buffer);
 +
  /*
   * Function: scsi_unprep_request()
   *
 @@ -85,6 +91,7 @@ static void scsi_unprep_request(struct r
   req-cmd_flags = ~REQ_DONTPREP;
   req-special = NULL;
  
 + kfree(scsi_bidi_data_buffer(cmd));
   scsi_put_command(cmd);
  }
  
 @@ -657,6 +664,7 @@ static struct scsi_cmnd *scsi_end_reques
   request_queue_t *q = cmd-device-request_queue;
   struct request *req = cmd-request;
   unsigned long flags;
 + struct scsi_data_buffer *sdb = scsi_bidi_data_buffer(cmd);
  
   /*
* If there are blocks left over at the end, set up the command
 @@ -685,6 +693,14 @@ static struct scsi_cmnd *scsi_end_reques
   }
   }
  
 + /*
 +  * a REQ_BLOCK_PC command is always completed fully so just do
 +  * end the bidi chunk.
 +  */
 + if (sdb)
 + end_that_request_chunk(req-next_rq, uptodate,
 +sdb-request_bufflen);

I think I agree you shouldn't call end_that_request_last(req-next_rq)
so sdb and req-next_rq should be freed here, no?

 +
   add_disk_randomness(req-rq_disk);
  
   spin_lock_irqsave(q-queue_lock, flags);
 @@ -701,34 +717,35 @@ static struct scsi_cmnd *scsi_end_reques
   return NULL;
  }
  
 -struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
 +static struct scatterlist *do_scsi_alloc_sgtable(unsigned short use_sg,
 +  unsigned short *sglist_len,
 +  gfp_t gfp_mask)
  {
   struct scsi_host_sg_pool *sgp;
 - struct scatterlist *sgl;
  
 - BUG_ON(!cmd-use_sg);
 + BUG_ON(!use_sg);
  
 - switch (cmd-use_sg) {
 + switch (use_sg) {
   case 1 ... 8:
 - cmd-sglist_len = 0;
 + *sglist_len = 0;
   break;
   case 9 ... 16:
 - cmd-sglist_len = 1;
 + *sglist_len = 1;
   break;
   case 17 ... 32:
 - cmd-sglist_len = 2;
 + *sglist_len = 2;
   break;
  #if (SCSI_MAX_PHYS_SEGMENTS  32)
   case 33 ... 64:
 - cmd-sglist_len = 3;
 + *sglist_len = 3;
   break;
  #if (SCSI_MAX_PHYS_SEGMENTS  64)
   case 65 ... 128:
 - cmd-sglist_len = 4;
 + *sglist_len = 4;
   break;
  #if (SCSI_MAX_PHYS_SEGMENTS   128)
   case 129 ... 256:
 - cmd-sglist_len = 5;
 + *sglist_len = 5;
   break;
  #endif
  #endif
 @@ -737,11 +754,14 @@ #endif
   return NULL;
   }
  
 - sgp = scsi_sg_pools + cmd-sglist_len;
 - sgl = mempool_alloc(sgp-pool, gfp_mask);
 - return sgl;
 + sgp = 

Re: [PATCH 4/4] bidi support: bidirectional request

2007-04-30 Thread Benny Halevy
Jens Axboe wrote:
 On Sun, Apr 29 2007, James Bottomley wrote:
 On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
 FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: [PATCH 4/4] bidi support: bidirectional request
 Date: Sun, 15 Apr 2007 20:33:28 +0300

 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
 index 645d24b..16a02ee 100644
 --- a/include/linux/blkdev.h
 +++ b/include/linux/blkdev.h
 @@ -322,6 +322,7 @@ struct request {
  void *end_io_data;

  struct request_io_part uni;
 +struct request_io_part bidi_read;
  };
 Would be more straightforward to have:

 struct request_io_part in;
 struct request_io_part out;

 Yes I wish I could do that. For bidi supporting drivers this is the most 
 logical.
 But for the 99.9% of uni-directional drivers, calling rq_uni(), and being 
 some what on
 the hotish paths, this means we will need a pointer to a uni 
 request_io_part.
 This is bad because:
 1st- There is no defined stage in a request life where to definitely set 
 that pointer,
  specially in the preparation stages.
 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now 
 this is a
  very bad spot already, and I have a short term fix for it in the 
 SCSI-bidi patches
  (not sent yet) but a more long term solution is needed. Once such 
 hacks are
  cleaned up we can do what you say. This is exactly why I use the 
 access functions
  rq_uni/rq_io/rq_in/rq_out and not open code access.
 I'm still not really convinced about this approach.  The primary job of
 the block layer is to manage and merge READ and WRITE requests.  It
 serves a beautiful secondary function of queueing for arbitrary requests
 it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
 indeed any non REQ_TYPE_FS).

 bidirectional requests fall into the latter category (there's nothing
 really we can do to merge them ... they're just transported by the block
 layer).  The only unusual feature is that they carry two bios.  I think
 the drivers that actually support bidirectional will be a rarity, so it
 might even be advisable to add it to the queue capability (refuse
 bidirectional requests at the top rather than perturbing all the drivers
 to process them).

 So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
 remove it from the standard path and put it on the special command type
 path where we can process it specially.  Additionally, if you take this
 approach, you can probably simply chain the second bio through
 req-special as an additional request in the stream.  The only thing
 that would then need modification would be the dequeue of the block
 driver (it would have to dequeue both requests and prepare them) and
 that needs to be done only for drivers handling bidirectional requests.
 
 I agree, I'm really not crazy about shuffling the entire request setup
 around just for something as exotic as bidirection commands. How about
 just keeping it simple - have a second request linked off the first one
 for the second data phase? So keep it completely seperate, not just
 overload -special for 2nd bio list.
 
 So basically just add a struct request pointer, so you can do rq =
 rq-next_rq or something for the next data phase. I bet this would be a
 LOT less invasive as well, and we can get by with a few helpers to
 support it.
 
 And it should definitely be a request type.
 

I'm a bit confused since what you both suggest is very similar to what we've
proposed back in October 2006 and the impression we got was that it will be
better to support bidirectional block requests natively (yet to be honest,
James, you wanted a linked request all along).

Before we go on that route again, how do you see the support for bidi
at the scsi mid-layer done?  Again, we prefer to support that officially
using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
a one-off feature, using special-purpose state and logic (e.g. a linked
struct scsi_cmd for the bidi_read sg list).

I'm attaching the patch we sent back then for your reference.
(for some reason I couldn't find the original post in the
any linux-scsi archives)

Regards,

Benny
Support for scsi variable length CDBs and bidirectional commands
  
Signed-off-by: Boaz Harrosh bharrosh@panasas.com
Signed-off-by: Benny Halevy bhalevy@panasas.com

 //depot/pub/linux/block/scsi_ioctl.c#1 - /home/bharrosh/p4.local/pub/linux/block/scsi_ioctl.c 
diff -Nup /tmp/tmp.12171.0 /home/bharrosh/p4.local/pub/linux/block/scsi_ioctl.c -L a/block/scsi_ioctl.c -L b/block/scsi_ioctl.c
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -32,10 +32,13 @@
 #include scsi/scsi_ioctl.h
 #include scsi/scsi_cmnd.h
 
-/* Command group 3 is reserved and should never be used.  */
+/*
+ * Command group 3 is used by variable length CDBs with
+ * opcode VARLEN_CDB.
+ */
 const unsigned char scsi_command_size[8] =
 {
-	6, 10, 10, 12,
+	6, 10, 10, 16,
 	16, 12, 10, 10
 };
 
 //depot/pub/linux

Re: [PATCH 4/4] bidi support: bidirectional request

2007-04-30 Thread Benny Halevy
Jens Axboe wrote:
 On Mon, Apr 30 2007, Douglas Gilbert wrote:
 Jens Axboe wrote:
 On Mon, Apr 30 2007, Benny Halevy wrote:
 Jens Axboe wrote:
 On Sun, Apr 29 2007, James Bottomley wrote:
 On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
 FUJITA Tomonori wrote:
 From: Boaz Harrosh [EMAIL PROTECTED]
 Subject: [PATCH 4/4] bidi support: bidirectional request
 Date: Sun, 15 Apr 2007 20:33:28 +0300

 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
 index 645d24b..16a02ee 100644
 --- a/include/linux/blkdev.h
 +++ b/include/linux/blkdev.h
 @@ -322,6 +322,7 @@ struct request {
  void *end_io_data;

  struct request_io_part uni;
 +struct request_io_part bidi_read;
  };
 Would be more straightforward to have:

 struct request_io_part in;
 struct request_io_part out;

 Yes I wish I could do that. For bidi supporting drivers this is the 
 most logical.
 But for the 99.9% of uni-directional drivers, calling rq_uni(), and 
 being some what on
 the hotish paths, this means we will need a pointer to a uni 
 request_io_part.
 This is bad because:
 1st- There is no defined stage in a request life where to definitely 
 set that pointer,
  specially in the preparation stages.
 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. 
 Now this is a
  very bad spot already, and I have a short term fix for it in the 
 SCSI-bidi patches
  (not sent yet) but a more long term solution is needed. Once such 
 hacks are
  cleaned up we can do what you say. This is exactly why I use the 
 access functions
  rq_uni/rq_io/rq_in/rq_out and not open code access.
 I'm still not really convinced about this approach.  The primary job of
 the block layer is to manage and merge READ and WRITE requests.  It
 serves a beautiful secondary function of queueing for arbitrary requests
 it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
 indeed any non REQ_TYPE_FS).

 bidirectional requests fall into the latter category (there's nothing
 really we can do to merge them ... they're just transported by the block
 layer).  The only unusual feature is that they carry two bios.  I think
 the drivers that actually support bidirectional will be a rarity, so it
 might even be advisable to add it to the queue capability (refuse
 bidirectional requests at the top rather than perturbing all the drivers
 to process them).

 So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
 remove it from the standard path and put it on the special command type
 path where we can process it specially.  Additionally, if you take this
 approach, you can probably simply chain the second bio through
 req-special as an additional request in the stream.  The only thing
 that would then need modification would be the dequeue of the block
 driver (it would have to dequeue both requests and prepare them) and
 that needs to be done only for drivers handling bidirectional requests.
 I agree, I'm really not crazy about shuffling the entire request setup
 around just for something as exotic as bidirection commands. How about
 just keeping it simple - have a second request linked off the first one
 for the second data phase? So keep it completely seperate, not just
 overload -special for 2nd bio list.

 So basically just add a struct request pointer, so you can do rq =
 rq-next_rq or something for the next data phase. I bet this would be a
 LOT less invasive as well, and we can get by with a few helpers to
 support it.

 And it should definitely be a request type.

 I'm a bit confused since what you both suggest is very similar to what 
 we've
 proposed back in October 2006 and the impression we got was that it will be
 better to support bidirectional block requests natively (yet to be honest,
 James, you wanted a linked request all along).
 It still has to be implemented natively at the block layer, just
 differently like described above. So instead of messing all over the
 block layer adding rq_uni() stuff, just add that struct request pointer
 to the request structure for the 2nd data phase. You can relatively easy
 then modify the block layer helpers to support mapping and setup of such
 requests.

 Before we go on that route again, how do you see the support for bidi
 at the scsi mid-layer done?  Again, we prefer to support that officially
 using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as
 a one-off feature, using special-purpose state and logic (e.g. a linked
 struct scsi_cmd for the bidi_read sg list).
 The SCSI part is up to James, that can be done as either inside a single
 scsi command, or as linked scsi commands as well. I don't care too much
 about that bit, just the block layer parts :-). And the proposed block
 layer design can be used both ways by the scsi layer.
 Linked SCSI commands have been obsolete since SPC-4 rev 6
 (18 July 2006) after proposal 06-259r1 was accepted. That
 proposal starts: The reasons for linked commands have been
 overtaken

Re: [RFC 3/6] bidi support: bidirectional request

2007-01-23 Thread Benny Halevy
James Bottomley wrote:
 On Mon, 2007-01-22 at 01:25 +0200, Boaz Harrosh wrote:
 - Instantiate another request_io_part in request for bidi_read.
 - Define  Implement new API for accessing bidi parts.
 - API to Build bidi requests and map to sglists.
 - Define new end_that_request_block() function to end a complete request.
 
 Actually, this approach looks to be a bit too narrow.  You seem to be
 predicating on the idea that the bidirectional will transfer in and out
 of the same area.  For some of the frame in/frame out stuff, we probably
 need the read and write areas for the bidirectional request to be
 separated.

Hmm, our proposal introduces two separate and independent i/o areas.
One used for uni-directional transfers and for bidi data output, the
other is used for bidi data input so we're in agreement.

Benny

 

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-23 Thread Benny Halevy
Muli Ben-Yehuda wrote:
 On Mon, Jan 22, 2007 at 01:21:28AM +0200, Boaz Harrosh wrote:
 
 - Introduce a new enum dma_data_direction data_dir member in struct request.
   and remove the RW bit from request-cmd_flag
 
 Some architecture use 'enum dma_data_direction' and some 'int
 dma_data_direction'. The consensus was to move to int over
 time. Please use 'int dma_data_direction'.

That should be fine (although I'm not sure what made you go this way :)
Please see my reply to Douglas, proposing an enum req_io_direction
at the block layer and up which will provide a better enumeration
for our use.


 +static inline int dma_write_dir(enum dma_data_direction dir)
 +{
 +return (dir == DMA_TO_DEVICE) || (dir == DMA_BIDIRECTIONAL);
 +}
 
 write can mean write to device or write to memory, depending on
 context. Not exactly something which should be a generic
 helper. Rename to 'dma_to_device(int dir)'?

much better. thanks!

 
 +static inline int dma_uni_dir(enum dma_data_direction dir)
 +{
 +return (dir == DMA_TO_DEVICE) || (dir == DMA_FROM_DEVICE) ||
 +   (dir == DMA_NONE);
 +}
 
 While this doesn't look very useful. Why is DMA_NONE a uni-dir? I
 suggest replacing this with an open coded (dir != DMA_BIDIRECTIONAL).

The idea was to be resilient to invalid values. (dir != DMA_BIDIRECTIONAL)
is fine of course, but I'd add a BUG_ON such as (dir  0 || dir  
DMA_BIDIRECTIONAL)

Benny
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-23 Thread Benny Halevy
Douglas Gilbert wrote:
 Benny Halevy wrote:
 Douglas Gilbert wrote:
 
 Perhaps the right use of DMA_BIRECTIONAL needs to be
 defined.
 
 Could it be used with a XDWRITE(10) SCSI command
 defined in sbc3r07.pdf at http://www.t10.org ? I suspect
 using two scatter gather lists would be a better approach.

Exactly. This is a classic example of a bidirectional command
and indeed two scatter-gather lists (that are mapped into two
bio lists) are used.

 
 - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc
   requests were used and bzero'ed.
 With a bi-directional transfer is it always unambiguous
 which transfer occurs first (or could they occur at
 the same time)?
 The bidi transfers can occur in any order and in parallel.
 
 Then it is not sufficient for modern SCSI transports in which
 certain bidirectional commands (probably most) have a well
 defined order.
 
 So DMA_BIDIRECTIONAL looks PCI specific and it may have
 been a mistake to replace other subsystem's direction flags
 with it. RDMA might be an interesting case.
 

I would say that it might make sense to define an equivalent
for dma_data_direction at the block layer, for example:

enum req_io_direction {
REQ_IO_NONE = 0,
REQ_IN_FROM_DEVICE = 1,
REQ_OUT_TO_DEVICE = 2,
REQ_BIDIRECTIONAL = 3,
};

can be used in struct request and upper layers.

Besides the fact that having separate I/O buffers for bidirectional
transfers makes block I/O different from pci bidi I/O,
this enum makes more sense arithmetically and has
a much better meaning for the zero value.
Today DMA_BIDIRECTIONAL is used in several places as the default and invalid
value since no-one ever used it before. I'd rather have the value 0 mean
REQ_IO_NONE (or REQ_IO_INVALID if we want such thing).

Benny
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-23 Thread Benny Halevy
Muli Ben-Yehuda wrote:
 On Tue, Jan 23, 2007 at 03:45:00PM +0200, Benny Halevy wrote:
 
 +static inline int dma_uni_dir(enum dma_data_direction dir)
 +{
 +  return (dir == DMA_TO_DEVICE) || (dir == DMA_FROM_DEVICE) ||
 + (dir == DMA_NONE);
 +}
 While this doesn't look very useful. Why is DMA_NONE a uni-dir? I
 suggest replacing this with an open coded (dir != DMA_BIDIRECTIONAL).
 The idea was to be resilient to invalid values. (dir != DMA_BIDIRECTIONAL)
 is fine of course, but I'd add a BUG_ON such as (dir  0 || dir 
 DMA_BIDIRECTIONAL)
 
 If DMA_NONE isn't actually allowed here, you can use valid_dma_direction().

DMA_NONE should be allowed as it is used by commands that do no I/O and these
are handled on uni-directional path.

BTW, the BUG_ON I suggested has a bug of course since (countering my intuition)
DMA_BIDIRECTIONAL==0, so it should be BUG_ON(dir  0 || dir  DMA_NONE)
instead.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-21 Thread Benny Halevy
Douglas Gilbert wrote:
 Boaz Harrosh wrote:
 - Introduce a new enum dma_data_direction data_dir member in struct request.
   and remove the RW bit from request-cmd_flag
 - Add new API to query request direction.
 - Adjust existing API and implementation.
 - Cleanup wrong use of DMA_BIDIRECTIONAL
 - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc
   requests were used and bzero'ed.
 
 With a bi-directional transfer is it always unambiguous
 which transfer occurs first (or could they occur at
 the same time)?

The bidi transfers can occur in any order and in parallel.

 
 Doug Gilbert
 -
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html