Re: [PATCH, RFC] MAINTAINERS: update OSD entries
On Wed, May 3, 2017 at 1:08 PM, Boaz Harrosh <o...@electrozaur.com> wrote: > > On 05/02/2017 02:33 PM, Jeff Layton wrote: > > On Tue, 2017-05-02 at 09:57 +0200, Christoph Hellwig wrote: > >> The open-osd domain doesn't exist anymore, and mails to the list lead > >> to really annoying bounced that repeat every day. > >> > >> Also the primarydata address for Benny bounces, and while I have a new > >> one for him he doesn't seem to be maintaining the OSD code any more. > >> > >> Which beggs the question: should we really leave the Supported status > >> in MAINTAINERS given that the code is barely maintained? > >> > >> Signed-off-by: Christoph Hellwig <h...@lst.de> > >> --- > >> MAINTAINERS | 4 > >> 1 file changed, 4 deletions(-) > >> > >> diff --git a/MAINTAINERS b/MAINTAINERS > >> index 1bb06c5f7716..28dd83a1d9e2 100644 > >> --- a/MAINTAINERS > >> +++ b/MAINTAINERS > >> @@ -9418,10 +9418,6 @@ F:drivers/net/wireless/intersil/orinoco/ > >> > >> OSD LIBRARY and FILESYSTEM > >> M: Boaz Harrosh <o...@electrozaur.com> > >> -M: Benny Halevy <bhal...@primarydata.com> > >> -L: osd-...@open-osd.org > >> -W: http://open-osd.org > >> -T: git git://git.open-osd.org/open-osd.git > >> S: Maintained > >> F: drivers/scsi/osd/ > >> F: include/scsi/osd_* > > > > Hah, you beat me to it! I was going to spin up a patch for this today. > > > > Acked-by: Jeff Layton <jlay...@redhat.com> > > Acked-by: Boaz Harrosh <o...@electrozaur.com> Acked-by: Benny Halevy <bhal...@gmail.com> > > > >
Re: [osd-dev] [patch] [SCSI] libosd: remover duplicate __bitwise annotation
On 2013-04-17 10:19, Dan Carpenter wrote: __be32 is already a __bitwise type so we don't need the second __bitwise here. It causes a Sparse error: include/scsi/osd_protocol.h:110:26: error: invalid modifier Signed-off-by: Dan Carpenter dan.carpen...@oracle.com ACK diff --git a/include/scsi/osd_protocol.h b/include/scsi/osd_protocol.h index a6026da..25ac628 100644 --- a/include/scsi/osd_protocol.h +++ b/include/scsi/osd_protocol.h @@ -107,7 +107,7 @@ enum osd_attributes_mode { * int exponent: 04; * } */ -typedef __be32 __bitwise osd_cdb_offset; +typedef __be32 osd_cdb_offset; enum { OSD_OFFSET_UNUSED = 0x, ___ osd-dev mailing list osd-...@open-osd.org http://mailman.open-osd.org/mailman/listinfo/osd-dev -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] [SCSI] libosd: check for kzalloc() failure
On Wed, Jan 30, 2013 at 10:57 AM, walter harms wha...@bfs.de wrote: Am 30.01.2013 09:27, schrieb Dan Carpenter: On Wed, Jan 30, 2013 at 09:15:43AM +0100, walter harms wrote: Am 30.01.2013 08:06, schrieb Dan Carpenter: There wasn't any error handling for this kzalloc(). Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index c06b8e5..d8293f2 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct osd_dev *od, odi-osdname_len = get_attrs[a].len; /* Avoid NULL for memcmp optimization 0-length is good enough */ odi-osdname = kzalloc(odi-osdname_len + 1, GFP_KERNEL); + if (!odi-osdname) { + ret = -ENOMEM; + goto out; + } if (odi-osdname_len) memcpy(odi-osdname, get_attrs[a].val_ptr, odi-osdname_len); OSD_INFO(OSD_NAME [%s]\n, odi-osdname); -- this looks like strdup() ? Maybe? It's a funny thing going on with the NUL terminator and I don't understand what the comment is about. It appears that normally get_attrs[a].val_ptr is a NUL terminated string but get_attrs[a].len does not count the terminator. Odd. i have no clue what the programmer was thinking. if i read this correct osdname is u8 *osdname; so a simple strdup() or strndup() would be ok the comment seems to indicate that get_attrs[a].val_ptr could be NULL but where is the check ? Perhaps they are not using ascii here ? then a memdup(get_attrs[a].len) would be better. There are subtle differences between kstrdup or kmemdup and this implementation. kmemdup is problematic as get_attrs[a].val_ptr does not contain the NUL terminator In the following case: if get_attrs[a].len == 0 then get_attrs[a].val_ptr == NULL The end result should be odi-osdname_len = 0 odi-osdname = kzalloc(1, GFP_KERNEL); while with kstrdup, odi-osdname will end up being NULL as it's input arg s is NULL. Benny re, wh -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] [SCSI] libosd: check for kzalloc() failure
On Wed, Jan 30, 2013 at 9:06 AM, Dan Carpenter dan.carpen...@oracle.com wrote: There wasn't any error handling for this kzalloc(). Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index c06b8e5..d8293f2 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct osd_dev *od, odi-osdname_len = get_attrs[a].len; /* Avoid NULL for memcmp optimization 0-length is good enough */ odi-osdname = kzalloc(odi-osdname_len + 1, GFP_KERNEL); + if (!odi-osdname) { + ret = -ENOMEM; + goto out; + } good catch! Benny if (odi-osdname_len) memcpy(odi-osdname, get_attrs[a].val_ptr, odi-osdname_len); OSD_INFO(OSD_NAME [%s]\n, odi-osdname); -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] [SCSI] libosd: check for kzalloc() failure
On Wed, Jan 30, 2013 at 3:00 PM, walter harms wha...@bfs.de wrote: Am 30.01.2013 10:51, schrieb Benny Halevy: On Wed, Jan 30, 2013 at 10:57 AM, walter harms wha...@bfs.de wrote: Am 30.01.2013 09:27, schrieb Dan Carpenter: On Wed, Jan 30, 2013 at 09:15:43AM +0100, walter harms wrote: Am 30.01.2013 08:06, schrieb Dan Carpenter: There wasn't any error handling for this kzalloc(). Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index c06b8e5..d8293f2 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct osd_dev *od, odi-osdname_len = get_attrs[a].len; /* Avoid NULL for memcmp optimization 0-length is good enough */ odi-osdname = kzalloc(odi-osdname_len + 1, GFP_KERNEL); + if (!odi-osdname) { + ret = -ENOMEM; + goto out; + } if (odi-osdname_len) memcpy(odi-osdname, get_attrs[a].val_ptr, odi-osdname_len); OSD_INFO(OSD_NAME [%s]\n, odi-osdname); -- this looks like strdup() ? Maybe? It's a funny thing going on with the NUL terminator and I don't understand what the comment is about. It appears that normally get_attrs[a].val_ptr is a NUL terminated string but get_attrs[a].len does not count the terminator. Odd. i have no clue what the programmer was thinking. if i read this correct osdname is u8 *osdname; so a simple strdup() or strndup() would be ok the comment seems to indicate that get_attrs[a].val_ptr could be NULL but where is the check ? Perhaps they are not using ascii here ? then a memdup(get_attrs[a].len) would be better. There are subtle differences between kstrdup or kmemdup and this implementation. kmemdup is problematic as get_attrs[a].val_ptr does not contain the NUL terminator ok, i understand - but can we assume that we are talking ascii here ? No, it can be anything. UTF-8 is more likely but not guaranteed either. In the following case: if get_attrs[a].len == 0 then get_attrs[a].val_ptr == NULL The end result should be odi-osdname_len = 0 odi-osdname = kzalloc(1, GFP_KERNEL); while with kstrdup, odi-osdname will end up being NULL as it's input arg s is NULL. and you want the argument to be ? May i ask why ? kfree() can handle NULL and kprintf() (-family) also. It was Boaz' decision at the time to simplify internal tests like _the_same_or_null in drivers/scsi/osd/osd_uld.c that doesn't check for NULL Nothing spectacular :) Benny re, wh Benny re, wh -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Revert IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs
Pete, the subject says PATCH 1/2 but I didn't see any follow-up message for PATCH 2/2. Just wondering :) Benny On Feb. 26, 2008, 10:27 -0800, Pete Wyckoff [EMAIL PROTECTED] wrote: This reverts commit a3cd7d9070be417a21905c997ee32d756d999b38. The original commit breaks iSER reliably, making it complain: iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11 The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries build up. This commit causes clean but used FMR entries also to be purged. During that process, another thread can see that there are no free FMRs and fail, even though there should always have been enough available. Signed-off-by: Pete Wyckoff [EMAIL PROTECTED] --- drivers/infiniband/core/fmr_pool.c | 21 ++--- 1 files changed, 6 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/core/fmr_pool.c b/drivers/infiniband/core/fmr_pool.c index 7f00347..4044fdf 100644 --- a/drivers/infiniband/core/fmr_pool.c +++ b/drivers/infiniband/core/fmr_pool.c @@ -139,7 +139,7 @@ static inline struct ib_pool_fmr *ib_fmr_cache_lookup(struct ib_fmr_pool *pool, static void ib_fmr_batch_release(struct ib_fmr_pool *pool) { int ret; - struct ib_pool_fmr *fmr, *next; + struct ib_pool_fmr *fmr; LIST_HEAD(unmap_list); LIST_HEAD(fmr_list); @@ -158,20 +158,6 @@ static void ib_fmr_batch_release(struct ib_fmr_pool *pool) #endif } - /* - * The free_list may hold FMRs that have been put there - * because they haven't reached the max_remap count. - * Invalidate their mapping as well. - */ - list_for_each_entry_safe(fmr, next, pool-free_list, list) { - if (fmr-remap_count == 0) - continue; - hlist_del_init(fmr-cache_node); - fmr-remap_count = 0; - list_add_tail(fmr-fmr-list, fmr_list); - list_move(fmr-list, unmap_list); - } - list_splice(pool-dirty_list, unmap_list); INIT_LIST_HEAD(pool-dirty_list); pool-dirty_len = 0; @@ -384,6 +370,11 @@ void ib_destroy_fmr_pool(struct ib_fmr_pool *pool) i = 0; list_for_each_entry_safe(fmr, tmp, pool-free_list, list) { + if (fmr-remap_count) { + INIT_LIST_HEAD(fmr_list); + list_add_tail(fmr-fmr-list, fmr_list); + ib_unmap_fmr(fmr_list); + } ib_dealloc_fmr(fmr-fmr); list_del(fmr-list); kfree(fmr); - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Revert IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs
Diabolical ;-) Thanks for the pointer! Benny On Feb. 26, 2008, 11:39 -0800, Matthew Wilcox [EMAIL PROTECTED] wrote: On Tue, Feb 26, 2008 at 11:23:01AM -0800, Benny Halevy wrote: Pete, the subject says PATCH 1/2 but I didn't see any follow-up message for PATCH 2/2. Just wondering :) I think the problem's on your end ... I got it and so did marc: http://marc.info/?l=linux-scsim=120405067313933w=2 - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_error: Fix language abuse.
Seriously, can't you just add a disclaimer to the README file? In http://lkml.org/lkml/2008/2/9/29, Luben Tuikov made an interesting point that in many cases illegal refers to a valid value that violates the specification, so the term invalid may be technically incorrect. Benny On Feb. 11, 2008, 18:07 +0200, linux-os (Dick Johnson) [EMAIL PROTECTED] wrote: On Fri, 8 Feb 2008, Mark Hounschell wrote: linux-os (Dick Johnson) wrote: The correct word should be invalid, in spite of the fact that the SCSI committee used invalid syntax. Alan is right. There is nothing illegal in the kernel and if there is, it must be removed as soon as it is discovered! il·le·gal (-lgl) adj. 1. Prohibited by law. *2. Prohibited by official rules: an illegal pass in football. *3. Unacceptable to or not performable by a computer: an illegal operation. Mark Many early computer programmers including myself, while writing error messages in early software, did not understand that computer programmers do not make law so we called bad operations illegal. Once you are called to testify in a court of law, about some message your code wrote to the screen just before a factory burned down, you start to be much more careful about the syntax. I advise that, regardless of the rewrite of dictionaries and, in fact, the rewrite of history, whenever possible we use the correct message syntax. Dictionaries pick up common usage in spite of the fact that it is wrong. See irregardless and other abortions which now exist in the dictionary. Cheers, Dick Johnson Penguin : Linux version 2.6.22.1 on an i686 machine (5588.28 BogoMips). My book : http://www.AbominableFirebug.com/ _ The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [EMAIL PROTECTED] - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
On Feb. 06, 2008, 14:16 +0200, Bart Van Assche [EMAIL PROTECTED] wrote: On Feb 5, 2008 6:01 PM, Erez Zilber [EMAIL PROTECTED] wrote: Using such large values for FirstBurstLength will give you poor performance numbers for WRITE commands (with iSER). FirstBurstLength means how much data should you send as unsolicited data (i.e. without RDMA). It means that your WRITE commands were sent without RDMA. Sorry, but I'm afraid you got this wrong. When the iSER transport is used instead of TCP, all data is sent via RDMA, including unsolicited data. If you have look at the iSER implementation in the Linux kernel (source files under drivers/infiniband/ulp/iser), you will see that all data is transferred via RDMA and not via TCP/IP. Regardless of what the current implementation is, the behavior you (Bart) describe seems to disagree with http://www.ietf.org/rfc/rfc5046.txt. Benny Bart Van Assche. - - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remove use_sg_chaining
On Jan. 21, 2008, 11:31 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Mon, Jan 21 2008, Boaz Harrosh wrote: On Sun, Jan 20 2008 at 22:59 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-01-20 at 21:01 +0100, Jens Axboe wrote: On Sun, Jan 20 2008, Jens Axboe wrote: On Sun, Jan 20 2008, Boaz Harrosh wrote: On Sun, Jan 20 2008 at 21:29 +0200, Jens Axboe [EMAIL PROTECTED] wrote: On Sun, Jan 20 2008, James Bottomley wrote: On Sun, 2008-01-20 at 21:18 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: this patch depends on the sg branch of the block tree James --- From: James Bottomley [EMAIL PROTECTED] Date: Tue, 15 Jan 2008 11:11:46 -0600 Subject: remove use_sg_chaining With the sg table code, every SCSI driver is now either chain capable or broken, so there's no need to have a check in the host template. Also tidy up the code by moving the scatterlist size defines into the SCSI includes and permit the last entry of the scatterlist pools not to be a power of two. --- I have a theoretical problem that BUGed me from the beginning. Could it happen that a memory critical IO, (that is needed to free memory), be collected into an sg-chained large IO, and the allocation of the multiple sg-pool-allocations fail, thous dead locking on out-of-memory? Is there a mechanism in place that will split large IO's into smaller chunks in the event of out-of-memory condition in prep_fn? Is it possible to call blk_rq_map_sg() with less then what is present at request to only map the starting portion? Obviously, that's why I was worrying about mempool size and default blocks a while ago. However, the deadlock only occurs if the device is swap or backing a filesystem with memory mapped files. The use cases for this are really tapes and other entities that need huge buffers. That's why we're keeping the system sector size at 1024 unless you alter it through sysfs (here gun, there foot ...) Alternatively (and much safer, imho), we allow blk_rq_map_sg() return smaller than nr_phys_segments and just ensure that the request is continued nicely through the normal 'request if residual' logic. Thats a grate Idea. I will Q it on my todo list. Thanks ok good, thanks :-) btw, the above is full of typos, my apologies. it should read requeue if residual, but I guess you already guessed as much. Something like ... It looks to me like it would make sense to have something like a BLKPREP_SGALLOCFAIL return so the block layer can do this for us ... Alternatively, we'll have to find a way of adjusting the sector count as it goes into the ULD prep functions. James By luck this is no problem because it happens exactly before the ULD actually prepares the command. sd and sr are already doing these adjustments based on bufflen. For BLOCK_PC we will need to fail with perhaps a new BLKPREP_SGALLOCFAIL, like you said, and let the initiator take care of it. Right, the scsi_init_io() takes care of it and adjusts the buflen as needed, no need to pass this error back. As far as I'm concerned, blocking for BLOCK_PC requests should be fine (is anyone using these for swap?). It could help the OSD I/O module which will produce BLOCK_PC bidi CDBs to get feedback in the form of ENOMEM so to throttle down its I/O coalescing sizes and generate smaller I/Os in face of memory pressure and then gradually throttle up when on success. If the requests are held back and blocked at the queue I'm concerned that could hurt performance under memory pressure by not filling up the pipeline as much as we can. Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] use dynamically allocated sense buffer
On Jan. 15, 2008, 17:20 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-01-15 at 18:23 +0900, FUJITA Tomonori wrote: This is the second version of http://marc.info/?l=linux-scsim=119933628210006w=2 I gave up once, but I found that the performance loss is negligible (within 1%) by using kmem_cache_alloc instead of mempool. I use scsi_debug with fake_rw=1 and disktest (DIO reads with 8 threads) again: scsi-misc (slub) | 486.9 MB/s IOPS 124652.9/s dynamic sense buf (slub) | 483.2 MB/s IOPS 123704.1/s scsi-misc (slab) | 467.0 MB/s IOPS 119544.3/s dynamic sense buf (slab) | 468.7 MB/s IOPS 119986.0/s The results are the averages of three runs with a server using two dual-core 1.60 GHz Xeon processors with DDR2 memory. I doubt think that someone will complain about the performance regression due to this patch. In addition, unlike scsi_debug, the real LLDs allocate the own data structure per scsi_cmnd so the performance differences would be smaller (and with the real hard disk overheads). Here's the full results: http://www.kernel.org/pub/linux/kernel/people/tomo/sense/results.txt Heh, that's one of those good news, bad news things. Certainly good news for you. The bad news for the rest of us is that you just implicated mempool in a performance problem and since they're the core of the SCSI scatterlist allocations and sit at the heart of the critical path in SCSI, we have a potential performance issue in the whole of SCSI. Looking at mempool's code this is peculiar as what seems to be its critical path for alloc and free looks pretty harmless and lightweight. Maybe an extra memory barrier, spin_{,un}lock_* and two extra function call (one of them can be eliminated BTW if the order of arguments to the mempool_{alloc,free}_t functions were the same as for kmem_cache_{alloc,free}). Benny James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] block layer varlen-cdb
On Nov. 01, 2007, 20:40 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote: On Thu, Nov 01, 2007 at 08:05:06PM +0200, Boaz Harrosh wrote: @@ -287,8 +287,13 @@ struct request { /* * when request is used as a packet command carrier */ -unsigned int cmd_len; +unsigned short cmd_len; +unsigned short varlen_cdb_len; /* length of varlen_cdb buffer */ unsigned char cmd[BLK_MAX_CDB]; +unsigned char *varlen_cdb; /* an optional variable-length cdb. + * points to a user buffer that must be + * valid until end of request + */ Try this instead: unsigned int cmd_len; - unsigned char cmd[BLK_MAX_CDB]; + unsigned char _cmd[BLK_MAX_CDB]; + unsigned char *cmd; Then initialise cmd to the address of _cmd. If you need to override it later (ie patch 3), you can. I agree this is probably the cleanest implementation but when Boaz and I initially discussed this approach he convinced me that LL block devices assume that req-cmd_len = BLK_MAX_CDB and it is unsafe at the moment to expose them potentially larger commands. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] aic7xxx: Enable 16-bit CDBs
On Oct. 22, 2007, 21:35 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote: On Mon, Oct 22, 2007 at 09:10:49PM +0200, Boaz Harrosh wrote: I'm about to finish an RFC patchset for the extended commands. I have implemented a more aggressive approach than the one I've been sending for the last year. (Matthew I have an extra 8-bytes save to scsi_cmnd on 64bit and 12 bytes for 32bit. Guess how? ;)) Well ... the command has to be stored somewhere. If it's an additional kmalloc, that's a loss. If it's in the request, that's a loss too ... let's see where you're keeping it ;-) I love spoiling Boaz's surprise :) The gist of it is scsi_cmnd-cmnd pointing at req-cmd or at req-varlen_cmd as appropriate. Most users can't tell the difference except if they tried using sizeof(scsi_cmnd-cmnd) in which case they can use MAX_COMMAND_SIZE instead. Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/32] scsi_data_buffer
On Oct. 18, 2007, 2:47 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote: On Wed, Oct 17, 2007 at 08:21:15PM +0200, Boaz Harrosh wrote: - Group all IO members of scsi_cmnd into a scsi_data_buffer structure. +struct scsi_data_buffer { +unsigned length; +int resid; +unsigned short sg_count; +unsigned short alloc_sg_count; +struct scatterlist* sglist; +}; This has exactly the problems I thought it would have. Due to alignment rules, it grows the scsi_cmnd from 368 to 376 bytes on x86-64. It remains at 272 bytes on i386 though, which is some consolation. By porting the patch I had for shrinking the scsi_cmnd, I can get it down to 352 bytes, but not as far as the 344 bytes I had it at before. The problem is the padding at the *end* of scsi_data_buffer (er, after I rearrange it in the patch below). There's nothing we can realistically put in it, given that we don't want to expand scsi_data_buffer's size on 32-bit machines. I have a fix ... but I don't think you'll like it. I certainly don't. But it does get us back down to 344 bytes. Updated patch below. I'm fully expecting the 'result' shenanigan to get it NACKed, but I'd like to see if it inspires anyone else to a more creative way of saving this space. yeah. The sglist pointer shuffle makes sense and so are the field type changes and coalescing, but the union holding the deprecated fields of scsi_data_buff is going away. #pragma pack(4) before struct scsi_cmnd (followed by #pragma pack()) should do the trick by saving member padding bytes. As a general rule I don't like pragma's but this one works, is easy to understand, and is pretty standard. --- Thanks to acme's pahole utility, I found some places where we can save a lot of bytes in scsi_cmnd, just by rearranging struct elements and reducing the size of some elements. We go from 272 to 260 bytes on x86 and from 368 to 344 bytes on x86-64. - eh_eflags had a 4-byte hole after it on 64-bit. In fact, this has value 0 or 1, so reduce it to an unsigned char, and put it with the other chars in scsi_cmnd. Saves 8 bytes on 64-bit. - sc_data_direction has a value from 0-3, so make it an unsigned char rather than an enum. Saves at least 4 bytes. - Putting 'tag' with the other char elements saves another 4 bytes - Moving 'result' into the union with the deprecated members saves 8 bytes on 64-bit. diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h index 047ffe6..2b10779 100644 --- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -13,22 +13,21 @@ struct Scsi_Host; struct scsi_device; struct scsi_data_buffer { + struct scatterlist* sglist; unsigned length; int resid; unsigned short sg_count; unsigned short alloc_sg_count; - struct scatterlist* sglist; }; /* embedded in scsi_cmnd */ struct scsi_pointer { char *ptr; /* data pointer */ - int this_residual; /* left in this buffer */ struct scatterlist *buffer; /* which buffer */ + dma_addr_t dma_handle; + int this_residual; /* left in this buffer */ int buffers_residual; /* how many buffers left */ -dma_addr_t dma_handle; - volatile int Status; volatile int Message; volatile int have_data_in; @@ -40,7 +39,6 @@ struct scsi_cmnd { struct scsi_device *device; struct list_head list; /* scsi_cmnd participates in queue lists */ struct list_head eh_entry; /* entry for the host eh_cmd_q */ - int eh_eflags; /* Used by error handlr */ /* * A SCSI Command is assigned a nonzero serial_number before passed @@ -64,7 +62,9 @@ struct scsi_cmnd { int timeout_per_command; unsigned char cmd_len; - enum dma_data_direction sc_data_direction; + unsigned char eh_eflags;/* Used by error handler */ + unsigned char sc_data_direction;/* enum dma_data_direction */ + unsigned char tag; /* SCSI-II queued command tag */ /* These elements define the operation we are about to perform */ #define MAX_COMMAND_SIZE 16 @@ -109,10 +109,6 @@ struct scsi_cmnd { * obtained by scsi_malloc is guaranteed * to be at an address 16Mb). */ - int result; /* Status code from lower level driver */ - - unsigned char tag; /* SCSI-II queued command tag */ - union { struct scsi_data_buffer sdb; /* @@ -121,11 +117,12 @@ struct scsi_cmnd { *of struct scsi_data_buffer members. */ struct { + void __deprecated *request_buffer; unsigned __deprecated request_bufflen; int __deprecated resid; unsigned short __deprecated use_sg;
Re: [PATCH 27/32] scsi_data_buffer
On Oct. 18, 2007, 10:06 +0200, Matthew Wilcox [EMAIL PROTECTED] wrote: On Thu, Oct 18, 2007 at 08:59:58AM +0200, Benny Halevy wrote: yeah. The sglist pointer shuffle makes sense and so are the field type changes and coalescing, but the union holding the deprecated fields of scsi_data_buff is going away. Indeed. We could always do ... union { struct scsi_data_pointer; struct { char _padding_[sizeof(struct scsi_data_pointer)]; int result; } } But that might set a new record of depths of obfuscation. agreed :) #pragma pack(4) before struct scsi_cmnd (followed by #pragma pack()) should do the trick by saving member padding bytes. As a general rule I don't like pragma's but this one works, is easy to understand, and is pretty standard. But it might well have the effect of pessimising accesses to the struct, and accesses to scsi_cmnd are pretty hot. I've seen gcc revert to byte loads and stores to packed structs in the past. Maybe it's got better now. that's why I suggested pack(4) and not pack(1), assuming 4-byte alignment is still as efficient as 8-byte alignment on most common 64 bit architectures. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86-64: pci-gart iommu sg chaining zeroes wrong sg.
On Sep 27, 2007, 18:46 +0200, FUJITA Tomonori [EMAIL PROTECTED] wrote: On Fri, 28 Sep 2007 01:38:27 +0900 FUJITA Tomonori [EMAIL PROTECTED] wrote: This patch is for Jens' block tree (sg chaining branch). I don't have the hardware but this looks like a bug. --- From: FUJITA Tomonori [EMAIL PROTECTED] Subject: [PATCH] x86-64: pci-gart iommu sg chaining zeroes a wrong sg's dma_length Needs to zero the end of the list. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] --- arch/x86_64/kernel/pci-gart.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c index 27b7db4..a4151a7 100644 --- a/arch/x86_64/kernel/pci-gart.c +++ b/arch/x86_64/kernel/pci-gart.c @@ -425,9 +425,10 @@ int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents, int dir) if (dma_map_cont(start_sg, i - start, sgmap, pages, need) 0) goto error; out++; +sgmap = sg_next(sgmap); flush_gart(); if (out nents) -ps-dma_length = 0; +sgmap-dma_length = 0; return out; Sorry, it should be: diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c index 27b7db4..cfcc84e 100644 --- a/arch/x86_64/kernel/pci-gart.c +++ b/arch/x86_64/kernel/pci-gart.c @@ -426,8 +426,10 @@ int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents, int dir) goto error; out++; flush_gart(); - if (out nents) - ps-dma_length = 0; + if (out nents) { + sgmap = sg_next(sgmap); + sgmap-dma_length = 0; + } looks correct to me. ps points at the previous scanned sg entry while you want to zero out dma_length at the entry immediately following the last entry mapped (if (out nents)) the original code before 62296749bd421904dace1e6b0fc3c4538aac7111 was: - if (out nents) - sg[out].dma_length = 0; Benny return out; error: - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ver2 5/5] arm: fas216 Use scsi_eh API for REQUEST_SENSE invocation
Russell King wrote: On Mon, Sep 10, 2007 at 10:39:11PM +0300, Boaz Harrosh wrote: - Use new scsi_eh_prep/restor_cmnd() for synchronous REQUEST_SENSE invocation. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/arm/fas216.c | 16 +++- drivers/scsi/arm/fas216.h |3 +++ 2 files changed, 6 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/arm/fas216.c b/drivers/scsi/arm/fas216.c index fb5f202..a715632 100644 --- a/drivers/scsi/arm/fas216.c +++ b/drivers/scsi/arm/fas216.c @@ -2018,6 +2018,7 @@ static void fas216_rq_sns_done(FAS216_Info *info, struct scsi_cmnd *SCpnt, * the upper layers to process. This would have been set * correctly by fas216_std_done. */ + scsi_eh_restore_cmnd(SCpnt, info-ses); SCpnt-scsi_done(SCpnt); } @@ -2103,23 +2104,12 @@ request_sense: if (SCpnt-cmnd[0] == REQUEST_SENSE) goto done; + scsi_eh_prep_cmnd(SCpnt, info-ses, NULL, 0, ~0); fas216_log_target(info, LOG_CONNECT, SCpnt-device-id, requesting sense); - memset(SCpnt-cmnd, 0, sizeof (SCpnt-cmnd)); - SCpnt-cmnd[0] = REQUEST_SENSE; - SCpnt-cmnd[1] = SCpnt-device-lun 5; - SCpnt-cmnd[4] = sizeof(SCpnt-sense_buffer); - SCpnt-cmd_len = COMMAND_SIZE(SCpnt-cmnd[0]); - SCpnt-SCp.buffer = NULL; - SCpnt-SCp.buffers_residual = 0; - SCpnt-SCp.ptr = (char *)SCpnt-sense_buffer; - SCpnt-SCp.this_residual = sizeof(SCpnt-sense_buffer); - SCpnt-SCp.phase = sizeof(SCpnt-sense_buffer); + init_SCp(SCpnt); SCpnt-SCp.Message = 0; SCpnt-SCp.Status = 0; - SCpnt-request_bufflen = sizeof(SCpnt-sense_buffer); - SCpnt-sc_data_direction = DMA_FROM_DEVICE; - SCpnt-use_sg = 0; SCpnt-tag = 0; SCpnt-host_scribble = (void *)fas216_rq_sns_done; So where do we end up setting up the request sense command? In scsi_eh_prep_cmnd(), when (sense_bytes != 0), called here: + scsi_eh_prep_cmnd(SCpnt, info-ses, NULL, 0, ~0); - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sg: increase sglist_len of the sg_scatter_hold structure
Mike Christie wrote: Mike Christie wrote: For drivers like sg and st, do mean the the sg list that is passed to functions like scsi_execute_async? If we kill that argument, and instead have sg.c and other scsi_execute_async callers just call blk helpers like blk_rq_map_user then we would not have to worry about drivers like sg needing to know about chaining right? I mean sg.c would not every interact with a scatterlist. It would just interact with a request and the blk helpers map data for it. There should be a return there. The scatterlist that sg and st interact with is bogus. It gets thrown away in scsi_execute_async and is only used for book keeping. I mean currently the scatterlist that sg and st use is bogus and gets thrown away. If we convert sg and st to use blk_rq_map_user then those drivers will not have to interact with a scatterlist at all. I'm not familiar with the dire details but this sounds like a good idea. Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 7/8] sd.c and sr.c move to scsi_sgtable implementation
FUJITA Tomonori wrote: From: Boaz Harrosh [EMAIL PROTECTED] Subject: [RFC 7/8] sd.c and sr.c move to scsi_sgtable implementation Date: Thu, 05 Jul 2007 16:44:04 +0300 - sd and sr would adjust IO size to align on device's block size. So code needs to change once we move to scsi_sgtable implementation. (Not compatible with un-converted drivers) - Convert code to use scsi_for_each_sg - Use Data accessors wherever is appropriate. Signed-off-by: Boaz Harrosh [EMAIL PROTECTED] --- drivers/scsi/sd.c | 10 -- drivers/scsi/sr.c | 27 --- 2 files changed, 16 insertions(+), 21 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 448d316..459fd23 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -338,7 +338,7 @@ static int sd_init_command(struct scsi_cmnd * SCpnt) struct request *rq = SCpnt-request; struct gendisk *disk = rq-rq_disk; sector_t block = rq-sector; -unsigned int this_count = SCpnt-request_bufflen 9; +unsigned int this_count = scsi_bufflen(SCpnt) 9; unsigned int timeout = sdp-timeout; SCSI_LOG_HLQUEUE(1, scmd_printk(KERN_INFO, SCpnt, @@ -418,9 +418,6 @@ static int sd_init_command(struct scsi_cmnd * SCpnt) } else if (rq_data_dir(rq) == READ) { SCpnt-cmnd[0] = READ_6; SCpnt-sc_data_direction = DMA_FROM_DEVICE; -} else { -scmd_printk(KERN_ERR, SCpnt, Unknown command %x\n, rq-cmd_flags); -return 0; } Why? This seems to be dead code as rq_data_dir can only be either READ or WRITE. The else case could be a BUG() but I don't see why it should be handled gracefully. SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, @@ -480,7 +477,8 @@ static int sd_init_command(struct scsi_cmnd * SCpnt) SCpnt-cmnd[4] = (unsigned char) this_count; SCpnt-cmnd[5] = 0; } -SCpnt-request_bufflen = this_count * sdp-sector_size; +BUG_ON(!SCpnt-sgtable); +SCpnt-sgtable-length = this_count * sdp-sector_size; /* * We shouldn't disconnect in the middle of a sector, so with a dumb @@ -892,7 +890,7 @@ static struct block_device_operations sd_fops = { static void sd_rw_intr(struct scsi_cmnd * SCpnt) { int result = SCpnt-result; -unsigned int xfer_size = SCpnt-request_bufflen; +unsigned int xfer_size = scsi_bufflen(SCpnt); unsigned int good_bytes = result ? 0 : xfer_size; u64 start_lba = SCpnt-request-sector; u64 bad_lba; diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c index f9a52af..ed61ca9 100644 --- a/drivers/scsi/sr.c +++ b/drivers/scsi/sr.c @@ -218,7 +218,7 @@ int sr_media_change(struct cdrom_device_info *cdi, int slot) static void rw_intr(struct scsi_cmnd * SCpnt) { int result = SCpnt-result; -int this_count = SCpnt-request_bufflen; +int this_count = scsi_bufflen(SCpnt); int good_bytes = (result == 0 ? this_count : 0); int block_sectors = 0; long error_sector; @@ -345,23 +345,20 @@ static int sr_init_command(struct scsi_cmnd * SCpnt) } else if (rq_data_dir(SCpnt-request) == READ) { SCpnt-cmnd[0] = READ_10; SCpnt-sc_data_direction = DMA_FROM_DEVICE; -} else { -blk_dump_rq_flags(SCpnt-request, Unknown sr command); -return 0; } ditto. ditto :) - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
James Bottomley wrote: On Tue, 2007-07-24 at 17:01 +0300, Benny Halevy wrote: FUJITA Tomonori wrote: I should have said that, was the approach to use separate buffer for sglists instead of putting the sglists and the parameters in one buffer completely rejected? I think that James should be asked this question. My understanding was that he preferred allocating the sgtable header along with the scatterlist array. All I really cared about was insulating the drivers from future changes in this area. It strikes me that for chained sglist implementations, this can all become a block layer responsibility, since more than SCSI will want to make use of it. agreed. Just remember, though that whatever is picked has to work in both memory constrained embedded systems as well as high end clusters. It seems to me (but this isn't a mandate) that a single tunable sg element chunk size will accomplish this the best (as in get rid of the entire SCSI sglist sizing machinery) . maybe :) I'm not as familiar as you are with all the different uses of linux. IMO, having a tunable is worse for the administrator and I'd prefer an automatic mechanism that will dynamically adapt the allocation size(s) to the actual use, very much like the one you have today. However, I'm perfectly happy to go with whatever the empirical evidence says is best .. and hopefully, now we don't have to pick this once and for all time ... we can alter it if whatever is chosen proves to be suboptimal. I agree. This isn't a catholic marriage :) We'll run some performance experiments comparing the sgtable chaining implementation vs. a scsi_data_buff implementation pointing at a possibly chained sglist and let's see if we can measure any difference. We'll send results as soon as we have them. There are pro's and con's either way. In my opinion separating the headers is better for mapping buffers that have a power of 2 #pages (which seems to be the typical case) since when you're losing one entry in the sgtable for the header you'd waste a lot more when you just cross the bucket boundary. E.g. for 64 pages you need to allocate from the 64 to 127 bucket rather than the 33 to 64 bucket). Separated, one sgtable header structure can just be embedded in struct scsi_cmnd for uni-directional transfers (wasting some space when transferring no data, but saving space and cycles in the common case vs. allocating it from a separate memory pool) and the one for bidi read buffers can be allocated separately just for bidi commands. This is all opinion ... could someone actually run some performance tests? James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Enable 16-bit CDBs for aic7xxx/aic79xxx
Just wondering, have you tried testing it with our patches to support long cdb? If not, it would be great if you could try doing that. A snapshot of them is in http://www.bhalevy.com/open-osd/download/sgtable_bidi_varlen/ Benny Hannes Reinecke wrote: Hannes Reinecke wrote: Hi James, this patch enables 16-bit CDBs for aic7xxx and aic79xx. aic7xxx actuallys supports up to 32-bit CDBs, so it might be that aic79xx does that, too. But this would include some more hacking, so this is way easier. Yeah, grand. That should read '16-byte CDBs', of course. But at least I've been consistent :-) New patch attached. Cheers, Hannes Enable 16-byte CDBs for aic7xxx/aix79xx The patch enables support for 16-byte CDBs in aic7xxx and aic79xx. aic7xxx can actually support up to 32-byte CDBs, should they ever see the light of day. Signed-off-by: Hannes Reinecke [EMAIL PROTECTED] diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c b/drivers/scsi/aic7xxx/aic79xx_osm.c index 286ab83..8502085 100644 --- a/drivers/scsi/aic7xxx/aic79xx_osm.c +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c @@ -1089,6 +1089,7 @@ ahd_linux_register_host(struct ahd_softc *ahd, struct scsi_host_template *templa host-max_id = (ahd-features AHD_WIDE) ? 16 : 8; host-max_lun = AHD_NUM_LUNS; host-max_channel = 0; + host-max_cmd_len = MAX_CDB_LEN; host-sg_tablesize = AHD_NSEG; ahd_lock(ahd, s); ahd_set_unit(ahd, ahd_linux_unit++); diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c b/drivers/scsi/aic7xxx/aic7xxx_osm.c index 1803ab6..a6b3071 100644 --- a/drivers/scsi/aic7xxx/aic7xxx_osm.c +++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c @@ -1047,6 +1047,7 @@ ahc_linux_register_host(struct ahc_softc *ahc, struct scsi_host_template *templa host-max_id = (ahc-features AHC_WIDE) ? 16 : 8; host-max_lun = AHC_NUM_LUNS; host-max_channel = (ahc-features AHC_TWIN) ? 1 : 0; + host-max_cmd_len = 32; host-sg_tablesize = AHC_NSEG; ahc_lock(ahc, s); ahc_set_unit(ahc, ahc_linux_unit++); - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET 0/5] Peaceful co-existence of scsi_sgtable and Large IO sg-chaining
FUJITA Tomonori wrote: I should have said that, was the approach to use separate buffer for sglists instead of putting the sglists and the parameters in one buffer completely rejected? I think that James should be asked this question. My understanding was that he preferred allocating the sgtable header along with the scatterlist array. There are pro's and con's either way. In my opinion separating the headers is better for mapping buffers that have a power of 2 #pages (which seems to be the typical case) since when you're losing one entry in the sgtable for the header you'd waste a lot more when you just cross the bucket boundary. E.g. for 64 pages you need to allocate from the 64 to 127 bucket rather than the 33 to 64 bucket). Separated, one sgtable header structure can just be embedded in struct scsi_cmnd for uni-directional transfers (wasting some space when transferring no data, but saving space and cycles in the common case vs. allocating it from a separate memory pool) and the one for bidi read buffers can be allocated separately just for bidi commands. Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation
Jens Axboe wrote: On Wed, Jul 18 2007, Boaz Harrosh wrote: Jens Axboe wrote: On Wed, Jul 18 2007, Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Mike Christie [EMAIL PROTECTED] Subject: Re: [RFC 4/8] scsi-ml: scsi_sgtable implementation Date: Thu, 12 Jul 2007 14:09:44 -0500 Boaz Harrosh wrote: +/* + * Should fit within a single page. + */ +enum { SCSI_MAX_SG_SEGMENTS = + ((PAGE_SIZE - sizeof(struct scsi_sgtable)) / + sizeof(struct scatterlist)) }; + +enum { SG_MEMPOOL_NR = + (SCSI_MAX_SG_SEGMENTS = 7) + + (SCSI_MAX_SG_SEGMENTS = 15) + + (SCSI_MAX_SG_SEGMENTS = 31) + + (SCSI_MAX_SG_SEGMENTS = 63) + + (SCSI_MAX_SG_SEGMENTS = 127) + + (SCSI_MAX_SG_SEGMENTS = 255) + + (SCSI_MAX_SG_SEGMENTS = 511) +}; What does SCSI_MAX_SG_SEGMENTS end up being on x86 now? On x86_64 or some other arch, we were going over a page when doing SCSI_MAX_PHYS_SEGMENTS of 256 right? Seems that 170 with x86 and 127 with x86_64. with scsi_sgtable we get one less than now Arch | SCSI_MAX_SG_SEGMENTS = | sizeof(struct scatterlist) --|-|--- x86_64| 127 |32 i386 CONFIG_HIGHMEM64G=y | 204 |20 i386 other| 255 |16 What's nice about this code is that now finally it is automatically calculated in compile time. Arch people don't have the headache did I break SCSI-ml?. For example observe the current bug with i386 CONFIG_HIGHMEM64G=y. The same should be done with BIO's. Than ARCHs with big pages can gain even more. What happened to Jens's scatter list chaining and how does this relate to it then? With Jens' sglist, we can set SCSI_MAX_SG_SEGMENTS to whatever we want. We can remove the above code. We need to push this and Jens' sglist together in one merge window, I think. No Tomo the above does not go away. What goes away is maybe: It does go away, since we can just set it to some safe value and use chaining to get us where we want. In my patches SCSI_MAX_PHYS_SEGMENTS has went away it does not exist anymore. Sure, I could just kill it as well. The point is that it's a parallel development, there's nothing in your patch that helps the sg chaining whatsoever. The only complex thing in the SCSI layer for sg chaining, is chaining when allocating and walking that chain on freeing. That's it! It seems like having the pool index in the sgtable structure simplifies the implementation a bit for allocation and freeing of linked sgtables. Boaz will send an example tomorrow (hopefully) showing how the merged code looks like. The new SCSI_MAX_SG_SEGMENTS (your name by the way) is right there and is calculated to maximize allocation in one page. Yes, it's still a good idea to make sure you get good packing in the page, it's of course cheaper to have less chaining. But it's not critical in the same way as before, when it could impact IO layout and performance. (I guess the right name is SCSI_MAX_PHYS_SEGMENTS_IN_A_PAGE) which will be needed in both your patches and mine. That would be better name. blk_queue_max_hw_segments(q, shost-sg_tablesize); - blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); blk_queue_max_sectors(q, shost-max_sectors); The reporting above is not needed and can be what ever block layer considers safe/optimized. You still need to set it, you can't just ignore it. Whether you redefine SCSI_MAX_SG_SEGMENTS or use (unsigned short) -1 doesn't really matter a whole lot. I'm working on a convergence patches that will do scsi_sg_pools cleanup which is common to both our patches, than scsi_sgtable, and than sg-chaining on top of that. I hope it gets accepted. The sg-chaining is much much simpler over scsi_sgtables. Sorry, I don't follow this paragraph at all. What is the scsi_sgtables change you are referring to? And how does it make sg chaining so much simpler? I guess my problem is that I don't know what problem this scsi_sgtables you refer to is fixing? scsi_sgtable is a solution proposed by James Bottomley where all I/O members of struct scsi_cmnd and the resid member, which need to be duplicated for bidirectional transfers, can be allocated together with the sg-list they are pointing to. This way when bidi comes, the all structure can be duplicated with minimal change to code, and with no extra baggage when bidi is not used. This was the all motivation for the data accessors and cleanup, swiping through the entire scsi tree. So when implementation changes drivers do not change with them. Now meanwhile moving over drivers code we (well Tomo mostly) removed the old !use_sg code path, and also abstracted the 2 major hot spots of above usages with scsi_dma_{un,}map, and the scsi_for_each_sg. Actually that one was changed from the original definition to match
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: From: Boaz Harrosh [EMAIL PROTECTED] Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 11:49:37 +0300 FUJITA Tomonori wrote: From: Jens Axboe [EMAIL PROTECTED] Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 07:48:13 +0200 On Thu, May 17 2007, FUJITA Tomonori wrote: From: Jens Axboe [EMAIL PROTECTED] Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Wed, 16 May 2007 19:53:22 +0200 On Wed, May 16 2007, Boaz Harrosh wrote: now there are 2 issues with this: 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get requests for use_sg=128 which will crash the kernel. That sounds like a serious issue, it should definitely not happen. Stuff like that would bite other drivers as well, are you absolutely sure that is happening? Power-of-2 bug in your code, or in the SCSI code? Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? These are regular fs (ext3) requests during bootup. The machine will not boot. (Usually from the read ahead code) Don't believe me look at the second patch Over Tomo's cleanup. If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I did in code: blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); I suppose someone is looking at a different definition. Or there is another call I need to do for this to work. I modified your patch a bit (sgtable allocation) and set SCSI_MAX_SG_SEGMENTS to 127. My ppc64 box seems to work (with iscsi_tcp and ipr drivers). iscsi_tcp sets sg_tablesize to 255, but blk_queue_max_phys_segments seems to work. One thing that I found is: +#define scsi_resid(cmd) ((cmd)-sg_table-resid) This doesn't work for some drivers (at least ipr) since they set cmd-resid even with commands without data transfer. Hmm, since we need a residual count also for the bidi_in transfer this problem is another reason for having the scsi_cmd_buff in struct scsi_cmnd, allocate another one from a pool for the bidi case, and point to the sglist in both cases rather than having a sg_table header allocated along with the sg list. Alternatively, having a pool for the no-data case might be another possible solution, though it seems a bit odd to me. snip -void scsi_free_sgtable(struct scatterlist *sgl, int index) +static struct scsi_sg_table *scsi_alloc_sgtable(int nseg, gfp_t gfp_mask) { struct scsi_host_sg_pool *sgp; + struct scsi_sg_table *sgt; + unsigned int idx; - BUG_ON(index = SG_MEMPOOL_NR); + for (idx = 0; idx SG_MEMPOOL_NR; idx++) + if (scsi_sg_pools[idx].size = nseg) + goto found; Tomo, I prefer the loop to be based on calculation as follows rather than scanning the scsi_sg_pools table in order to minimize memory access (and thrashing of the cpu data cache - each scsi_host_sg_pool takes a cache row on x86_64) + for (i = 0, size = 8; i SG_MEMPOOL_NR-1; i++, size = 1) + if (size nents) + return i; Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/19] add data buffer accessors
FUJITA Tomonori wrote: +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ + for (__i = 0, sg = scsi_sglist(cmd); __i (nseg); __i++, (sg)++) + This feels like a layering violation, why not use for_each_sg()? +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ for_each_sg(scsi_sglist(cmd), (sg), (nseg), (__i)) \ That said, I'm not sure that scsi_for_each_sg() is worth abstracting since the caller can just as well do for_each_sg() directly as sketched above... Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/19] add data buffer accessors
FUJITA Tomonori wrote: From: Benny Halevy [EMAIL PROTECTED] Subject: Re: [PATCH 1/19] add data buffer accessors Date: Mon, 14 May 2007 10:57:08 +0300 FUJITA Tomonori wrote: +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ + for (__i = 0, sg = scsi_sglist(cmd); __i (nseg); __i++, (sg)++) + This feels like a layering violation, why not use for_each_sg()? +#define scsi_for_each_sg(cmd, sg, nseg, __i)\ for_each_sg(scsi_sglist(cmd), (sg), (nseg), (__i)) \ As I said before, when for_each_sg is ready, we'll convert scsi_for_each_sg to use for_each_sg. thanks. works for me. That said, I'm not sure that scsi_for_each_sg() is worth abstracting since the caller can just as well do for_each_sg() directly as sketched above... I'm not sure why you think it's a layering violation. I'd like to think of struct scatterlist as an abstract data type with its own traversal method that hides its internals. Not a layer per-se but more of an abstraction... With scsi_for_each_sg(), many drivers don't need scsi_sglist(). Sure, just my two cents... - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] bidi support
Tomo, Thanks for quickly crafting this patchset. Please see my comments in-line below. Putting aside the controversial design issues, we need to carefully compare our patches against yours to make sure no functional issues have been overlooked. Benny FUJITA Tomonori wrote: From: FUJITA Tomonori [EMAIL PROTECTED] Subject: [PATCH 0/2] bidi support Date: Sat, 05 May 2007 18:02:29 +0900 This patchset add bidi support for block pc requests. Oh, this patchset is against Linus' tree. The first patch (the block layer changes) can be applied against Jens' tree. The second patch (the scsi mid-layer changes) has one minor reject against the scsi-misc tree. The following patch is against the scsi-misc. --- From 6a8c5375f1f6dbd574107920dd0a613527bd556b Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori [EMAIL PROTECTED] Date: Sat, 5 May 2007 18:11:42 +0900 Subject: [PATCH] add bidi support for block pc requests This adds bidi support for block pc requests. A bidi request uses req-next_rq pointer for an in request. This patch introduces a new structure, scsi_data_buffer to hold the data buffer information. To avoid make scsi_cmnd structure fatter, the scsi mid-layer uses cmnd-request-next_rq-special pointer for a scsi_data_buffer structure. LLDs don't touch the second request (req-next_rq) so it's safe to use req-special. scsi_blk_pc_done() always completes the whole command so scsi_end_request() simply completes the bidi chunk too. A helper function, scsi_bidi_data_buffer() is for LLDs to access to the scsi_data_buffer structure easily. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] --- drivers/scsi/scsi_lib.c | 120 +++-- include/scsi/scsi_cmnd.h | 14 + 2 files changed, 118 insertions(+), 16 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index be8e655..8f7873a 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -66,6 +66,12 @@ #undef SP static void scsi_run_queue(struct request_queue *q); +struct scsi_data_buffer *scsi_bidi_data_buffer(struct scsi_cmnd *cmd) +{ + return blk_bidi_rq(cmd-request) ? cmd-request-next_rq-special : NULL; +} +EXPORT_SYMBOL(scsi_bidi_data_buffer); + /* * Function: scsi_unprep_request() * @@ -85,6 +91,7 @@ static void scsi_unprep_request(struct r req-cmd_flags = ~REQ_DONTPREP; req-special = NULL; + kfree(scsi_bidi_data_buffer(cmd)); scsi_put_command(cmd); } @@ -657,6 +664,7 @@ static struct scsi_cmnd *scsi_end_reques request_queue_t *q = cmd-device-request_queue; struct request *req = cmd-request; unsigned long flags; + struct scsi_data_buffer *sdb = scsi_bidi_data_buffer(cmd); /* * If there are blocks left over at the end, set up the command @@ -685,6 +693,14 @@ static struct scsi_cmnd *scsi_end_reques } } + /* + * a REQ_BLOCK_PC command is always completed fully so just do + * end the bidi chunk. + */ + if (sdb) + end_that_request_chunk(req-next_rq, uptodate, +sdb-request_bufflen); I think I agree you shouldn't call end_that_request_last(req-next_rq) so sdb and req-next_rq should be freed here, no? + add_disk_randomness(req-rq_disk); spin_lock_irqsave(q-queue_lock, flags); @@ -701,34 +717,35 @@ static struct scsi_cmnd *scsi_end_reques return NULL; } -struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask) +static struct scatterlist *do_scsi_alloc_sgtable(unsigned short use_sg, + unsigned short *sglist_len, + gfp_t gfp_mask) { struct scsi_host_sg_pool *sgp; - struct scatterlist *sgl; - BUG_ON(!cmd-use_sg); + BUG_ON(!use_sg); - switch (cmd-use_sg) { + switch (use_sg) { case 1 ... 8: - cmd-sglist_len = 0; + *sglist_len = 0; break; case 9 ... 16: - cmd-sglist_len = 1; + *sglist_len = 1; break; case 17 ... 32: - cmd-sglist_len = 2; + *sglist_len = 2; break; #if (SCSI_MAX_PHYS_SEGMENTS 32) case 33 ... 64: - cmd-sglist_len = 3; + *sglist_len = 3; break; #if (SCSI_MAX_PHYS_SEGMENTS 64) case 65 ... 128: - cmd-sglist_len = 4; + *sglist_len = 4; break; #if (SCSI_MAX_PHYS_SEGMENTS 128) case 129 ... 256: - cmd-sglist_len = 5; + *sglist_len = 5; break; #endif #endif @@ -737,11 +754,14 @@ #endif return NULL; } - sgp = scsi_sg_pools + cmd-sglist_len; - sgl = mempool_alloc(sgp-pool, gfp_mask); - return sgl; + sgp =
Re: [PATCH 4/4] bidi support: bidirectional request
Jens Axboe wrote: On Sun, Apr 29 2007, James Bottomley wrote: On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Boaz Harrosh [EMAIL PROTECTED] Subject: [PATCH 4/4] bidi support: bidirectional request Date: Sun, 15 Apr 2007 20:33:28 +0300 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 645d24b..16a02ee 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -322,6 +322,7 @@ struct request { void *end_io_data; struct request_io_part uni; +struct request_io_part bidi_read; }; Would be more straightforward to have: struct request_io_part in; struct request_io_part out; Yes I wish I could do that. For bidi supporting drivers this is the most logical. But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on the hotish paths, this means we will need a pointer to a uni request_io_part. This is bad because: 1st- There is no defined stage in a request life where to definitely set that pointer, specially in the preparation stages. 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a very bad spot already, and I have a short term fix for it in the SCSI-bidi patches (not sent yet) but a more long term solution is needed. Once such hacks are cleaned up we can do what you say. This is exactly why I use the access functions rq_uni/rq_io/rq_in/rq_out and not open code access. I'm still not really convinced about this approach. The primary job of the block layer is to manage and merge READ and WRITE requests. It serves a beautiful secondary function of queueing for arbitrary requests it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or indeed any non REQ_TYPE_FS). bidirectional requests fall into the latter category (there's nothing really we can do to merge them ... they're just transported by the block layer). The only unusual feature is that they carry two bios. I think the drivers that actually support bidirectional will be a rarity, so it might even be advisable to add it to the queue capability (refuse bidirectional requests at the top rather than perturbing all the drivers to process them). So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI? That will remove it from the standard path and put it on the special command type path where we can process it specially. Additionally, if you take this approach, you can probably simply chain the second bio through req-special as an additional request in the stream. The only thing that would then need modification would be the dequeue of the block driver (it would have to dequeue both requests and prepare them) and that needs to be done only for drivers handling bidirectional requests. I agree, I'm really not crazy about shuffling the entire request setup around just for something as exotic as bidirection commands. How about just keeping it simple - have a second request linked off the first one for the second data phase? So keep it completely seperate, not just overload -special for 2nd bio list. So basically just add a struct request pointer, so you can do rq = rq-next_rq or something for the next data phase. I bet this would be a LOT less invasive as well, and we can get by with a few helpers to support it. And it should definitely be a request type. I'm a bit confused since what you both suggest is very similar to what we've proposed back in October 2006 and the impression we got was that it will be better to support bidirectional block requests natively (yet to be honest, James, you wanted a linked request all along). Before we go on that route again, how do you see the support for bidi at the scsi mid-layer done? Again, we prefer to support that officially using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as a one-off feature, using special-purpose state and logic (e.g. a linked struct scsi_cmd for the bidi_read sg list). I'm attaching the patch we sent back then for your reference. (for some reason I couldn't find the original post in the any linux-scsi archives) Regards, Benny Support for scsi variable length CDBs and bidirectional commands Signed-off-by: Boaz Harrosh bharrosh@panasas.com Signed-off-by: Benny Halevy bhalevy@panasas.com //depot/pub/linux/block/scsi_ioctl.c#1 - /home/bharrosh/p4.local/pub/linux/block/scsi_ioctl.c diff -Nup /tmp/tmp.12171.0 /home/bharrosh/p4.local/pub/linux/block/scsi_ioctl.c -L a/block/scsi_ioctl.c -L b/block/scsi_ioctl.c --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -32,10 +32,13 @@ #include scsi/scsi_ioctl.h #include scsi/scsi_cmnd.h -/* Command group 3 is reserved and should never be used. */ +/* + * Command group 3 is used by variable length CDBs with + * opcode VARLEN_CDB. + */ const unsigned char scsi_command_size[8] = { - 6, 10, 10, 12, + 6, 10, 10, 16, 16, 12, 10, 10 }; //depot/pub/linux
Re: [PATCH 4/4] bidi support: bidirectional request
Jens Axboe wrote: On Mon, Apr 30 2007, Douglas Gilbert wrote: Jens Axboe wrote: On Mon, Apr 30 2007, Benny Halevy wrote: Jens Axboe wrote: On Sun, Apr 29 2007, James Bottomley wrote: On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Boaz Harrosh [EMAIL PROTECTED] Subject: [PATCH 4/4] bidi support: bidirectional request Date: Sun, 15 Apr 2007 20:33:28 +0300 diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 645d24b..16a02ee 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -322,6 +322,7 @@ struct request { void *end_io_data; struct request_io_part uni; +struct request_io_part bidi_read; }; Would be more straightforward to have: struct request_io_part in; struct request_io_part out; Yes I wish I could do that. For bidi supporting drivers this is the most logical. But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on the hotish paths, this means we will need a pointer to a uni request_io_part. This is bad because: 1st- There is no defined stage in a request life where to definitely set that pointer, specially in the preparation stages. 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a very bad spot already, and I have a short term fix for it in the SCSI-bidi patches (not sent yet) but a more long term solution is needed. Once such hacks are cleaned up we can do what you say. This is exactly why I use the access functions rq_uni/rq_io/rq_in/rq_out and not open code access. I'm still not really convinced about this approach. The primary job of the block layer is to manage and merge READ and WRITE requests. It serves a beautiful secondary function of queueing for arbitrary requests it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or indeed any non REQ_TYPE_FS). bidirectional requests fall into the latter category (there's nothing really we can do to merge them ... they're just transported by the block layer). The only unusual feature is that they carry two bios. I think the drivers that actually support bidirectional will be a rarity, so it might even be advisable to add it to the queue capability (refuse bidirectional requests at the top rather than perturbing all the drivers to process them). So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI? That will remove it from the standard path and put it on the special command type path where we can process it specially. Additionally, if you take this approach, you can probably simply chain the second bio through req-special as an additional request in the stream. The only thing that would then need modification would be the dequeue of the block driver (it would have to dequeue both requests and prepare them) and that needs to be done only for drivers handling bidirectional requests. I agree, I'm really not crazy about shuffling the entire request setup around just for something as exotic as bidirection commands. How about just keeping it simple - have a second request linked off the first one for the second data phase? So keep it completely seperate, not just overload -special for 2nd bio list. So basically just add a struct request pointer, so you can do rq = rq-next_rq or something for the next data phase. I bet this would be a LOT less invasive as well, and we can get by with a few helpers to support it. And it should definitely be a request type. I'm a bit confused since what you both suggest is very similar to what we've proposed back in October 2006 and the impression we got was that it will be better to support bidirectional block requests natively (yet to be honest, James, you wanted a linked request all along). It still has to be implemented natively at the block layer, just differently like described above. So instead of messing all over the block layer adding rq_uni() stuff, just add that struct request pointer to the request structure for the 2nd data phase. You can relatively easy then modify the block layer helpers to support mapping and setup of such requests. Before we go on that route again, how do you see the support for bidi at the scsi mid-layer done? Again, we prefer to support that officially using two struct scsi_cmnd_buff instances in struct scsi_cmnd and not as a one-off feature, using special-purpose state and logic (e.g. a linked struct scsi_cmd for the bidi_read sg list). The SCSI part is up to James, that can be done as either inside a single scsi command, or as linked scsi commands as well. I don't care too much about that bit, just the block layer parts :-). And the proposed block layer design can be used both ways by the scsi layer. Linked SCSI commands have been obsolete since SPC-4 rev 6 (18 July 2006) after proposal 06-259r1 was accepted. That proposal starts: The reasons for linked commands have been overtaken
Re: [RFC 3/6] bidi support: bidirectional request
James Bottomley wrote: On Mon, 2007-01-22 at 01:25 +0200, Boaz Harrosh wrote: - Instantiate another request_io_part in request for bidi_read. - Define Implement new API for accessing bidi parts. - API to Build bidi requests and map to sglists. - Define new end_that_request_block() function to end a complete request. Actually, this approach looks to be a bit too narrow. You seem to be predicating on the idea that the bidirectional will transfer in and out of the same area. For some of the frame in/frame out stuff, we probably need the read and write areas for the bidirectional request to be separated. Hmm, our proposal introduces two separate and independent i/o areas. One used for uni-directional transfers and for bidi data output, the other is used for bidi data input so we're in agreement. Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/6] bidi support: request dma_data_direction
Muli Ben-Yehuda wrote: On Mon, Jan 22, 2007 at 01:21:28AM +0200, Boaz Harrosh wrote: - Introduce a new enum dma_data_direction data_dir member in struct request. and remove the RW bit from request-cmd_flag Some architecture use 'enum dma_data_direction' and some 'int dma_data_direction'. The consensus was to move to int over time. Please use 'int dma_data_direction'. That should be fine (although I'm not sure what made you go this way :) Please see my reply to Douglas, proposing an enum req_io_direction at the block layer and up which will provide a better enumeration for our use. +static inline int dma_write_dir(enum dma_data_direction dir) +{ +return (dir == DMA_TO_DEVICE) || (dir == DMA_BIDIRECTIONAL); +} write can mean write to device or write to memory, depending on context. Not exactly something which should be a generic helper. Rename to 'dma_to_device(int dir)'? much better. thanks! +static inline int dma_uni_dir(enum dma_data_direction dir) +{ +return (dir == DMA_TO_DEVICE) || (dir == DMA_FROM_DEVICE) || + (dir == DMA_NONE); +} While this doesn't look very useful. Why is DMA_NONE a uni-dir? I suggest replacing this with an open coded (dir != DMA_BIDIRECTIONAL). The idea was to be resilient to invalid values. (dir != DMA_BIDIRECTIONAL) is fine of course, but I'd add a BUG_ON such as (dir 0 || dir DMA_BIDIRECTIONAL) Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/6] bidi support: request dma_data_direction
Douglas Gilbert wrote: Benny Halevy wrote: Douglas Gilbert wrote: Perhaps the right use of DMA_BIRECTIONAL needs to be defined. Could it be used with a XDWRITE(10) SCSI command defined in sbc3r07.pdf at http://www.t10.org ? I suspect using two scatter gather lists would be a better approach. Exactly. This is a classic example of a bidirectional command and indeed two scatter-gather lists (that are mapped into two bio lists) are used. - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc requests were used and bzero'ed. With a bi-directional transfer is it always unambiguous which transfer occurs first (or could they occur at the same time)? The bidi transfers can occur in any order and in parallel. Then it is not sufficient for modern SCSI transports in which certain bidirectional commands (probably most) have a well defined order. So DMA_BIDIRECTIONAL looks PCI specific and it may have been a mistake to replace other subsystem's direction flags with it. RDMA might be an interesting case. I would say that it might make sense to define an equivalent for dma_data_direction at the block layer, for example: enum req_io_direction { REQ_IO_NONE = 0, REQ_IN_FROM_DEVICE = 1, REQ_OUT_TO_DEVICE = 2, REQ_BIDIRECTIONAL = 3, }; can be used in struct request and upper layers. Besides the fact that having separate I/O buffers for bidirectional transfers makes block I/O different from pci bidi I/O, this enum makes more sense arithmetically and has a much better meaning for the zero value. Today DMA_BIDIRECTIONAL is used in several places as the default and invalid value since no-one ever used it before. I'd rather have the value 0 mean REQ_IO_NONE (or REQ_IO_INVALID if we want such thing). Benny - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/6] bidi support: request dma_data_direction
Muli Ben-Yehuda wrote: On Tue, Jan 23, 2007 at 03:45:00PM +0200, Benny Halevy wrote: +static inline int dma_uni_dir(enum dma_data_direction dir) +{ + return (dir == DMA_TO_DEVICE) || (dir == DMA_FROM_DEVICE) || + (dir == DMA_NONE); +} While this doesn't look very useful. Why is DMA_NONE a uni-dir? I suggest replacing this with an open coded (dir != DMA_BIDIRECTIONAL). The idea was to be resilient to invalid values. (dir != DMA_BIDIRECTIONAL) is fine of course, but I'd add a BUG_ON such as (dir 0 || dir DMA_BIDIRECTIONAL) If DMA_NONE isn't actually allowed here, you can use valid_dma_direction(). DMA_NONE should be allowed as it is used by commands that do no I/O and these are handled on uni-directional path. BTW, the BUG_ON I suggested has a bug of course since (countering my intuition) DMA_BIDIRECTIONAL==0, so it should be BUG_ON(dir 0 || dir DMA_NONE) instead. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/6] bidi support: request dma_data_direction
Douglas Gilbert wrote: Boaz Harrosh wrote: - Introduce a new enum dma_data_direction data_dir member in struct request. and remove the RW bit from request-cmd_flag - Add new API to query request direction. - Adjust existing API and implementation. - Cleanup wrong use of DMA_BIDIRECTIONAL - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc requests were used and bzero'ed. With a bi-directional transfer is it always unambiguous which transfer occurs first (or could they occur at the same time)? The bidi transfers can occur in any order and in parallel. Doug Gilbert - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html