Robert Love wrote: > On Mon, 2009-05-18 at 16:11 -0700, Joe Eykholt wrote: >> Robert Love wrote: >>> This patch makes the EMs SMP aware. They will allocate >>> XIDs by pools determined by the number of possible CPUs >>> in the system. >>> >>> The goal is to have I/O echanges processed on one CPU to >> Typo: exchanges >> >>> avoid cache misses. This means that whatever CPU is processing >>> the READ request should also process the READ response. To >>> do so, this patch carves up an XID space in an EM by the number >>> of CPUs in the system. It then checks the processor ID and >>> gives out XIDs to the active CPU from its pool. On Rx, there >>> is a queue/thread per CPU already, so this patch just uses >>> (XID % num_possible_cpus) to determine which CPU/queue/thread >>> the response should be processed by. >> The hash should be determined by the largest CPU number plus 1. >> That is the same as num_possible_cpus() for x86, and probably all >> other architectures. >> > Are you suggesting that I should do something other than- > > cpu = oxid % num_possible_cpus(); > > because it isn't portable? I don't think you are because you would have > said more in the next comment. Just checking...
I just wasn't sure. I think it's OK for all Linux platforms. On another concern: Is num_possible_cpus() a run-time-determined number? I hope so. On systems which can't dynamically add CPUs (I'm not talking about turning on CPUs that were previously there), the number should be reasonable, like 32 max. On bigger systems, we should get more exchange space, so maybe MAX_XID should be adjusted. If the exchange space is too small per CPU that would be bad. Eventually, I'd like to try having an exchange manager per CPU, so that each EM can manage its own space under its own lock. This would mean using higher-order bits of the XID as the CPU number (XID = cpu_id * XID_PER_CPU + n; cpu = oxid / XID_PER_CPU). >>> If a CPU is offlined there is no reallocation of resources >>> (i.e. XIDs), they just go unused. Also, if a CPU is offlined >>> our Rx handler will move the response frame to another CPU. >>> We will take a cache miss, but that isn't a concern as online/ >>> offline CPU should be infrequent and the number of frames >>> that might go to the wrong CPU would likely show no impact. >>> >>> Signed-off-by: Robert Love <[email protected]> >>> --- >>> >>> drivers/scsi/fcoe/fcoe.c | 2 + >>> drivers/scsi/libfc/fc_exch.c | 77 >>> +++++++++++++++++++++++++++++++++++++----- >>> 2 files changed, 69 insertions(+), 10 deletions(-) >>> >>> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c >>> index f589a47..0001de8 100644 >>> --- a/drivers/scsi/fcoe/fcoe.c >>> +++ b/drivers/scsi/fcoe/fcoe.c >>> @@ -991,7 +991,7 @@ int fcoe_rcv(struct sk_buff *skb, struct net_device >>> *dev, >>> * cpu bits to get cpu and then this cpu is used for selecting >>> * a per cpu kernel thread from fcoe_percpu. >>> */ >>> - cpu = oxid & (num_online_cpus() - 1); >>> + cpu = oxid % num_online_cpus(); >> This should be num_possible_cpus() which is what the allocation used. >> > Yeah, you're right. I would have hit this when testing online/offine of > CPUs. Well, maybe not. :) If I had 4 CPUs and offline'd #2, then I would > not use #3 at all and I'd still be targeting #2, which of course would > move the work to a different CPU. > >>> #endif >>> >>> fps = &per_cpu(fcoe_percpu, cpu); >>> diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c >>> index 0664366..1670bad 100644 >>> --- a/drivers/scsi/libfc/fc_exch.c >>> +++ b/drivers/scsi/libfc/fc_exch.c >>> @@ -69,7 +69,7 @@ struct fc_exch_mgr { >>> spinlock_t em_lock; /* exchange manager lock, >>> must be taken before ex_lock */ >>> u16 em_user; /* exchange manager user count */ >>> - u16 last_xid; /* last allocated exchange ID */ >>> + u16 *last_xid; /* last allocated exchange ID */ >>> u16 min_xid; /* min exchange ID */ >>> u16 max_xid; /* max exchange ID */ >>> u32 total_exches; /* total allocated exchanges */ >>> @@ -471,6 +471,49 @@ static struct fc_seq *fc_seq_alloc(struct fc_exch *ep, >>> u8 seq_id) >>> return sp; >>> } >>> >>> +static u16 fc_exch_xid_find(struct fc_exch_mgr *mp, unsigned int cpu_idx) >>> +{ >>> + int found = 0; >>> + unsigned int nr_cpus = num_possible_cpus(); >>> + u16 xid, prev_xid; >>> + prev_xid = xid = *per_cpu_ptr(mp->last_xid, cpu_idx); >>> + >>> + while (!found) { >> Found is unnecessary since we just break when we find the xid. >> We could just return. >> >> for (;;) { >> >>> + /* Wrap around or increment */ >>> + if (xid == FC_XID_UNKNOWN) >>> + xid = mp->min_xid + cpu_idx; >> This assumes that min_xid is the right starting ID for cpu 0. >> Is that valid? Need to check that when we allocate. >> Being off would hurt performance. >> > Fixed this up, it's much nicer. Thanks. > >>> + else >>> + xid = xid + nr_cpus; >>> + >>> + /* >>> + * We've come back to the original last_xid, >>> + * so we've exhausted the list. >>> + */ >>> + if (xid == prev_xid) { >>> + xid = FC_XID_UNKNOWN; >>> + found = 1; >> No need to set found. Just do break or return FC_XID_UNKNOWN; >> >>> + break; >>> + } >>> + >>> + /* Have we exceeded the max XIDs? */ >>> + if (xid > mp->max_xid) { >>> + xid = FC_XID_UNKNOWN; >>> + continue; >>> + } >>> + >>> + /* >>> + * Always deal with real XIDs until now, where we >>> + * need to look at the array index by subtracting one. >>> + */ >>> + if (!mp->exches[xid - mp->min_xid]) { >>> + *per_cpu_ptr(mp->last_xid, cpu_idx) = xid; >> return xid; >> >>> + found = 1; >>> + break; >>> + } >>> + } >>> + return xid; >>> +} >>> + >>> /* >>> * fc_exch_alloc - allocate an exchange and its exchange id. >>> * @mp : ptr to the exchange manager >>> @@ -481,6 +524,7 @@ struct fc_exch *fc_exch_alloc(struct fc_exch_mgr *mp) >>> { >>> struct fc_exch *ep; >>> u16 min, max, xid; >>> + unsigned int cpu; >>> >>> /* allocate memory for exchange */ >>> ep = mempool_alloc(mp->ep_pool, GFP_ATOMIC); >>> @@ -494,14 +538,25 @@ struct fc_exch *fc_exch_alloc(struct fc_exch_mgr *mp) >>> max = mp->max_xid; >>> >>> spin_lock_bh(&mp->em_lock); >>> - xid = mp->last_xid; >>> - /* alloc a new xid */ >>> - while (mp->exches[xid - min]) { >>> - xid = (xid == max) ? min : xid + 1; >>> - if (xid == mp->last_xid) >>> - goto err; >>> + >>> + xid = fc_exch_xid_find(mp, smp_processor_id()); >>> + if (xid == FC_XID_UNKNOWN) { >>> + /* >>> + * It would be nice to have a more intelligent >>> + * algorithm here instead of starting with CPU 0 >>> + * everytime. >> typo: every time. >> >>> + */ >> Use cpumask_next(cpu, cpu_online_mask). >> Stop when getting back to the original cpu. Note that the current CPU >> could change, so doing smp_processor_id() in the loop isn't quite correct, >> although it'll work most of the time. >> >> We could do a get_cpu()/put_cpu() to inhibit rescheduling. >> > I need to think about this more. > >>> + for_each_online_cpu(cpu) { >>> + if (cpu == smp_processor_id()) >>> + continue; >>> + xid = fc_exch_xid_find(mp, cpu); >>> + if (xid != FC_XID_UNKNOWN) >>> + break; >>> + } >>> } >> Here's an alternative version, replacing above from first call to >> fc_exch_xid_find(): >> >> cpu = smp_processor_id(); >> first_cpu = cpu; >> do { >> xid = fc_exch_xid_find(mp, cpu); >> if (xid != FC_XID_UNKNOWN) >> goto found; >> } while ((cpu = cpumask_next(cpu, cpu_online_mask)) != first_cpu); >> return NULL; >> found: >> > Done, thanks. > > _______________________________________________ devel mailing list [email protected] http://www.open-fcoe.org/mailman/listinfo/devel
