Sometime max IO retries exhausts under heavy stress with
large size IOs (>512K) and that leads to following IO errors.

[ 2522.907984] sd 3:0:2:50: [sdu] Unhandled error code
[ 2522.907990] sd 3:0:2:50: [sdu] Result: hostbyte=DID_ERROR 
driverbyte=DRIVER_OK
[ 2522.907995] sd 3:0:2:50: [sdu] CDB: Write(10): 2a 00 00 00 08 00 00 04 00 00
[ 2522.908012] end_request: I/O error, dev sdu, sector 2048

The FCoE stack doesn't have any direct flow control to fcoe stack
for congestion due to temporary condition of longer link pauses
or end-to-end delay between I-T-L, in turn under large IOs stress
test sometimes max SCSI IO retries exhausts and leads to above IO
errors and thrash of several retry attempts.

Currently stack is configured with .sg_tablesize as SG_ALL (128)
and that limits large IO up to 512K, so this patch reduces this
to 64 to reduce cost of retrying on any failed large IO as this
will limit large IOs to only upto 256K max.

Above .sg_tablesize change helped in reducing above errors but
still sometime errors occurred in some setup configuration.
I'm using these two configuration :-

        1. 256 Clarion luns on single 4G rport
        2. 256 RAMSAN luns, four 4G rport with each 16 luns.

To completely avoid errors I tried reduced max can_queue to 128
and that worked well without any IO errors on my system but this
isn't good fix as likely this would impact performance on smaller
size IOs, so I'm trying to add code to adjust can_queue more
aggressively only when any large size IO failures occurs while
leaving max can_queue as-is to 1024.

This patch adds code to reduce can_queue aggressively to
FC_CAN_QUEUE_LIO(64) on any single large IO retry attempt,
the large IO threshold is set to 128K (FC_FCP_LIO_MASK) and
could be configured to any value.

Also this patch changes can_queue ramp down and ramp up to adjust
can_queue in smaller steps FC_CAN_QUEUE_STEPS with longer
120HZ time period, so that stack would dynamically adjust
to optimal can_queue value according to IO retry rate and
that would dynamically throttle traffic under stress to
avoid IO errors due to max retry.

This worked fine in setup#2 but didn't work well for setup #1.

I guess some more adjustments are need for the case #1 since that
has just one rport with 256 luns but not sure how well these
adjustments can be done which could generic to most systems to
throttle IO rate smoothly for any end to end link condition.

Any suggestion to make this generic ? Or this isn't good idea
to adjust dynamically instead max queue_depth should be adjusted
from /sysfs for worst case on all sdev and number of sdev per
rport can be considered in deducing optional value for this.

Or any other idea/solution to fix these IO error under stress
than adjusting various queues and .sg_tablesize ?

I also tried increasing FC_SCSI_REC_TOV and FC_SCSI_TM_TOV
values to 6 HZ and 20HZ but that alone did only some help
in reducing IO error frequency.

I wonder how does native FC HBA handles end-to-end flow control
in most commonly used connection less CLASS-3, I guess
perhaps FC HBA blocks/busy shost based on their link level
available buffer credits. We could do same by pause feed back
but in a way what I'm doing on retry attempts is perhaps from
loner pauses, anyway there isn't any clean i/f to get pause
stats or state from netdev.

I'd appreciate any thoughts on this?

        Thanks
        Vasu

Signed-off-by: Vasu Dev <[email protected]>
---

 drivers/scsi/fcoe/fcoe.c    |    2 -
 drivers/scsi/libfc/fc_fcp.c |   92 +++++++++++++++++++++++++++++--------------
 2 files changed, 62 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index c1571bc..2b19206 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -236,7 +236,7 @@ static struct scsi_host_template fcoe_shost_template = {
        .cmd_per_lun = 3,
        .can_queue = FCOE_MAX_OUTSTANDING_COMMANDS,
        .use_clustering = ENABLE_CLUSTERING,
-       .sg_tablesize = SG_ALL,
+       .sg_tablesize = 64,
        .max_sectors = 0xffff,
 };
 
diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index ff80624..7e6230e 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -77,6 +77,8 @@ struct kmem_cache *scsi_pkt_cachep;
 struct fc_fcp_internal {
        mempool_t        *scsi_pkt_pool;
        struct list_head scsi_pkt_queue;
+       atomic_t no_large_retry;
+       u32 large_retry;
        unsigned long last_can_queue_ramp_down_time;
        unsigned long last_can_queue_ramp_up_time;
        int max_can_queue;
@@ -127,7 +129,13 @@ static void fc_fcp_srr_error(struct fc_fcp_pkt *, struct 
fc_frame *);
 #define FC_SCSI_TM_TOV         (10 * HZ)
 #define FC_SCSI_REC_TOV                (2 * HZ)
 #define FC_HOST_RESET_TIMEOUT  (30 * HZ)
-#define FC_CAN_QUEUE_PERIOD    (60 * HZ)
+#define FC_CAN_QUEUE_PERIOD    (120 * HZ)
+
+#define FC_CAN_QUEUE_STEPS     4
+#define FC_CAN_QUEUE_LIO       64
+#define FC_FCP_LIO_MASK                0x1FFFF
+#define FC_FCP_LIO_RETRY1      1
+#define FC_FCP_LIO_RETRY2      512
 
 #define FC_MAX_ERROR_CNT       5
 #define FC_MAX_RECOV_RETRY     3
@@ -278,28 +286,6 @@ static int fc_fcp_send_abort(struct fc_fcp_pkt *fsp)
 }
 
 /**
- * fc_fcp_retry_cmd() - Retry a fcp_pkt
- * @fsp: The FCP packet to be retried
- *
- * Sets the status code to be FC_ERROR and then calls
- * fc_fcp_complete_locked() which in turn calls fc_io_compl().
- * fc_io_compl() will notify the SCSI-ml that the I/O is done.
- * The SCSI-ml will retry the command.
- */
-static void fc_fcp_retry_cmd(struct fc_fcp_pkt *fsp)
-{
-       if (fsp->seq_ptr) {
-               fsp->lp->tt.exch_done(fsp->seq_ptr);
-               fsp->seq_ptr = NULL;
-       }
-
-       fsp->state &= ~FC_SRB_ABORT_PENDING;
-       fsp->io_status = 0;
-       fsp->status_code = FC_ERROR;
-       fc_fcp_complete_locked(fsp);
-}
-
-/**
  * fc_fcp_ddp_setup() - Calls a LLD's ddp_setup routine to set up DDP context
  * @fsp: The FCP packet that will manage the DDP frames
  * @xid: The XID that will be used for the DDP exchange
@@ -364,12 +350,14 @@ static void fc_fcp_can_queue_ramp_up(struct fc_lport 
*lport)
 
        si->last_can_queue_ramp_up_time = jiffies;
 
-       can_queue = lport->host->can_queue << 1;
+       can_queue = lport->host->can_queue + FC_CAN_QUEUE_STEPS;
        if (can_queue >= si->max_can_queue) {
                can_queue = si->max_can_queue;
                si->last_can_queue_ramp_down_time = 0;
        }
        lport->host->can_queue = can_queue;
+       if (can_queue > FC_CAN_QUEUE_LIO)
+               si->large_retry = FC_FCP_LIO_RETRY1;
        shost_printk(KERN_ERR, lport->host, "libfc: increased "
                     "can_queue to %d.\n", can_queue);
 }
@@ -391,20 +379,28 @@ static void fc_fcp_can_queue_ramp_down(struct fc_lport 
*lport)
        struct fc_fcp_internal *si = fc_get_scsi_internal(lport);
        int can_queue;
 
+       can_queue = lport->host->can_queue;
+
+       if (atomic_read(&si->no_large_retry) >= si->large_retry) {
+               if (lport->host->can_queue > FC_CAN_QUEUE_LIO)
+                       can_queue = FC_CAN_QUEUE_LIO;
+               atomic_set(&si->no_large_retry, 0);
+               si->large_retry = FC_FCP_LIO_RETRY2;
+       }
+
        if (si->last_can_queue_ramp_down_time &&
            (time_before(jiffies, si->last_can_queue_ramp_down_time +
-                        FC_CAN_QUEUE_PERIOD)))
+                        FC_CAN_QUEUE_PERIOD - 10)))
                return;
 
-       si->last_can_queue_ramp_down_time = jiffies;
-
-       can_queue = lport->host->can_queue;
-       can_queue >>= 1;
+       can_queue -= FC_CAN_QUEUE_STEPS;
        if (!can_queue)
                can_queue = 1;
+
+       si->last_can_queue_ramp_down_time = jiffies;
        lport->host->can_queue = can_queue;
-       shost_printk(KERN_ERR, lport->host, "libfc: Could not allocate frame.\n"
-                    "Reducing can_queue to %d.\n", can_queue);
+       shost_printk(KERN_ERR, lport->host, "libfc: "
+                    "reduced can_queue to %d.\n", can_queue);
 }
 
 /*
@@ -431,6 +427,39 @@ static inline struct fc_frame *fc_fcp_frame_alloc(struct 
fc_lport *lport,
 }
 
 /**
+ * fc_fcp_retry_cmd() - Retry a fcp_pkt
+ * @fsp: The FCP packet to be retried
+ *
+ * Sets the status code to be FC_ERROR and then calls
+ * fc_fcp_complete_locked() which in turn calls fc_io_compl().
+ * fc_io_compl() will notify the SCSI-ml that the I/O is done.
+ * The SCSI-ml will retry the command.
+ */
+static void fc_fcp_retry_cmd(struct fc_fcp_pkt *fsp)
+{
+       struct fc_fcp_internal *si = fc_get_scsi_internal(fsp->lp);
+       unsigned long flags;
+
+       if (fsp->seq_ptr) {
+               fsp->lp->tt.exch_done(fsp->seq_ptr);
+               fsp->seq_ptr = NULL;
+       }
+
+       if ((scsi_bufflen(fsp->cmd) & ~FC_FCP_LIO_MASK) &&
+            atomic_inc_return(&si->no_large_retry) >= si->large_retry) {
+               spin_lock_irqsave(fsp->lp->host->host_lock, flags);
+               fc_fcp_can_queue_ramp_down(fsp->lp);
+               spin_unlock_irqrestore(fsp->lp->host->host_lock, flags);
+       }
+
+       fsp->state &= ~FC_SRB_ABORT_PENDING;
+       fsp->io_status = 0;
+       fsp->status_code = FC_ERROR;
+       fc_fcp_complete_locked(fsp);
+}
+
+
+/**
  * fc_fcp_recv_data() - Handler for receiving SCSI-FCP data from a target
  * @fsp: The FCP packet the data is on
  * @fp:         The data frame
@@ -2346,6 +2375,7 @@ int fc_fcp_init(struct fc_lport *lport)
                rc = -ENOMEM;
                goto free_internal;
        }
+       si->large_retry = FC_FCP_LIO_RETRY1;
        return 0;
 
 free_internal:

_______________________________________________
devel mailing list
[email protected]
http://www.open-fcoe.org/mailman/listinfo/devel

Reply via email to