On Wed, Oct 17, 2007 at 12:55:56PM +0800, Cathy Zhou wrote: > >> Leave that aside, if I understand correctly, "normal" packets would be > >> send/received from the data-path-stream, and "special" packets > >> (received because of the promiscuity-mode or the multicast packets) > >> would be received from the shared-lower-stream. > >> > > initially the data stream is the only stream that can receive packets > > (unicast + broadcast) since it's bound to a particular sap (e.g. IP). > > the shared stream is not bound so it cannot receive anything. > > > Assume that there is a per-stream DLS stream, and another MAC client that > requires the shared lower stream to be bound to a sap (for example, an > aggregation is created over this MAC), does that mean the performance of the > per-stream DLS stream would also suffer from this? Further, in this case, > whether or not to filter out the duplicated messages depends not only on the > state of the stream itself, but also the existence of other MAC clients. > This does not seem right. >
first, aggr would probably mac_active_set() the softmac so the other DLS stream you mentioned can only be a passive (most likely snoop) stream right? why are you so concerned with snoop performance? ok. let suppose it's not aggr but another upper stream that sends down a DL_ENABMULTI_REQ to the shared stream. yes, this will affect performance of other upper streams. but the performance hit most likely won't be due to filtering; it will be due to the cost of demultiplexing to multiple streams within the dlpi driver underneath softmac. from softmac upwards, the filtering cost will be one comparison (calling mac_ether_unicst_verify()). let me reiterate what I suggested: case 1 one data stream, one shared stream, both bound to sap, with multicast addresses added to the shared stream, with physical promisc mode off on the shared stream (but DL_PROMISC_SAP could be on): softmac_rput() filters packets as follows: -data path stream drops all non-unicast packets. -shared stream drops all unicast packets. case 2 same config as case 1 except physical promisc mode is ON on the shared stream: there are two ways softmac_rput() could filter packets: simple option (hurts data path performance when there are snoops) -data path stream drops all packets -shared stream passes up all packets. dls filter option -data path stream drops all non-unicast packets -shared stream passes up all packets. dls_accept() drops unicast packets if the dls_impl_t has a data path stream open and the dls_impl_t does not belong to a snoop upper stream. > > if a upper stream issues a DL_ENABMULTI_REQ, this is when we have to > > bind the shared stream as well since we want multicast packets to go up > > the shared stream. but what sap do we bind to? it probably doesn't > > matter because we would have to turn on DL_PROMISC_SAP on the shared > > stream in order to get all multicast packets (not just ones destined to > > a particular sap). > > > Note that any upper stream which has DL_ENABMULTI_REQed, would cause the > shared stream to be bound to a SAP, and this would influence other upper > streams (to do the duplicate message drop). > if you filter according to the above rules, I don't think the cost is significant. yes, the demux cost in the dlpi driver needs to be accounted for too. but I think the drivers we care about do dupmsg() so that shouldn't be too bad. > > now we have two streams ready to receive data: > > data stream would receive unicast + broadcast packets. > > shared stream would recieve unicast + multicast + broadcast packets. > > > > both streams would enter softmac_rput() before nemo so some filtering > > could be done at that point. we could make unicast packets go only to > > the data path stream (via rxinfo->mpr_rx()) and broadcast + multicast go > > only to the shared stream (via mac_rx()). the check should require > > comparing only one byte. > > > This one-byte comparison assumes Ethernet media type, right? > yes. with the IB driver porting already under way, I don't think this is a problem. > >> That means every packets would go through both streams, and we would > >> need to change dls_accept() function and filter out the duplicated > >> messages (the policy to decide whether or not pass up to the DLS > >> clients would become complicated, see more below). > > > > supposing we do not turn on DL_PROMISC_PHYS on the lower stream (more > > later), why does dls_accept() need to change if we filter early in > > softmac as I mentioned above? dls_accept() will compare the > > per-dls_impl_t multicast list anyway so filtering is already being done. > > > >> At this point, without prototype, I am not sure whether it can be > >> done. The first concern I have is the performance impact it may bring. > >> > > > > from mac and above, the performance impact is mostly likely small. what > > I am worried is what happens in the dlpi driver if we turn on > > DLS_PROMISC_SAP. If I remember correctly ce does dupmsg() when passing a > > packet up multiple streams right? > > That is correct. But my concern is mostly that assume there is aggregations > created over this MAC, or there is one upper stream is set to promiscuous > mode, which would affect other upper stream's performance significantly. > I don't think you need to worry about the aggr case unless you really want to make passive streams perform well. for the promisc case, I think we could choose the "dls filter" option, which should not affect data path performance much. > > > also, I noticed you added a new function dld_wput_perstream_callback() > > and similar putnext() code in dld_wsrv(). the reason you do this here > > instead of within softmac_m_tx() is because you can't pass down the the > > lower stream write queue, right? > > dld_wput_perstream_callback() is for the per-stream scheme upperstream, > (which is the fast-path), and we don't do softmac_m_tx() is that we'd like > to skip MAC layer completely in that case, so neither mac_tx(), or > mac_txloop() is called in that case. > but if we could add a third arg to the mi_tx() entry point so you could pass down a write queue, why do you need to bypass mac at all? the cost of calling dls_tx()->mt_fn() should be very similar to calling dld_wput_perstream_callback(). If the performance gain is not huge, I would really like to see this removed from dld. > At this point. I think I get your point, that you'd like to switch between > the share-lower-scheme and per-stream-scheme depends the state of the upper > stream (whether the stream is in promiscuous mode or DL_ENABMULTI_REQ is > sent). What I don't agree is that the scheme selection would be affected by > the state of other upper streams as well. > the way I see it is: legacy drivers are going away. legacy nics will all be EOL'ed someday if they haven't already. nemo, on the other hand, is going to stay and will continually be extended for a long time. if we make any optimizations in the framework now, there had better be good reasons that these are going to benefit most drivers, not just a small subset of them. this per-stream optimization is, unfortunately, benefiting only legacy drivers and in my opinion, its implementation is far too invasive to the framework. I think we need to make a judgement call here, do we want to: -support all legacy drivers and guarantee 0% degradation under *all* circumstances. if this is a requirement, then it's unfortunate that we would likely have to keep the existing perstream implementation. the downside of this is that eventually this code will become a maintenance burden. or, -support all legacy drivers and guarantee 0% degradation under *most* circumstances. if the framework can be made a lot simpler this way, I really think this is a worthwhile trade off. upcoming projects will also be much less burdened by trying to be compatible with this optimization. what do you think? eric
