Dear Jakub, PSB.

On 5/22/2018 1:32 PM, Jakub Kicinski wrote:
On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:
On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
From: Huy Nguyen <h...@mellanox.com>

In this patch, we add dcbnl buffer attribute to allow user
change the NIC's buffer configuration such as priority
to buffer mapping and buffer size of individual buffer.

This attribute combined with pfc attribute allows advance user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more prirorities or user
can give large buffer to certain priorities.

We present an use case scenario where dcbnl buffer attribute configured
by advance user helps reduce the latency of messages of different sizes.

Scenarios description:
On ConnectX-5, we run latency sensitive traffic with
small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
traffic with large messages sizes 512KB and 1MB. We group small, medium,
and large message sizes to their own pfc enables priorities as follow.
    Priorities 1 & 2 (64B, 256B and 1KB)
    Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
    Priorities 5 & 6 (512KB and 1MB)

By default, ConnectX-5 maps all pfc enabled priorities to a single
lossless fixed buffer size of 50% of total available buffer space. The
other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
we create three equal size lossless buffers. Each buffer has 25% of total
available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
to lossless  buffer mappings are set as follow.
    Priorities 1 & 2 on lossless buffer #1
    Priorities 3 & 4 on lossless buffer #2
    Priorities 5 & 6 on lossless buffer #3

We observe improvements in latency for small and medium message sizes
as follows. Please note that the large message sizes bandwidth performance is
reduced but the total bandwidth remains the same.
    256B message size (42 % latency reduction)
    4K message size (21% latency reduction)
    64K message size (16% latency reduction)

Signed-off-by: Huy Nguyen <h...@mellanox.com>
Signed-off-by: Saeed Mahameed <sae...@mellanox.com>
On a cursory look this bares a lot of resemblance to devlink shared
buffer configuration ABI.  Did you look into using that?

Just to be clear devlink shared buffer ABIs don't require representors
and "switchdev mode".
.
[HQN] Dear Jakub, there are several reasons that devlink shared buffer
ABI cannot be used:
1. The devlink shared buffer ABI is written based on the switch cli
which you can find out more
from this link https://community.mellanox.com/docs/DOC-2558.
Devlink API accommodates requirements of simpler (SwitchX2?) and more
advanced schemes (present in Spectrum).  The simpler/basic static
threshold configurations is exactly what you are doing here, AFAIU.
[HQN] Devlink API is tailored specifically for switch. We don't configure threshold configuration explicitly. It is done via PFC. Once PFC is enabled on priority, threshold is setup based on our
proprietary formula that were tested rigorously for performance.
2. The dcbnl interfaces have been used for QoS settings.
QoS settings != shared buffer configuration.
[HQN] I think we have different definition about "shared buffer". Please refer to this below switch cli link.
It explained in detail what is the "shared buffer" in switch means.
Our NIC does not have "shared buffer" supported.
https://community.mellanox.com/docs/DOC-2591


In NIC, the  buffer configuration are tied to priority (ETS PFC).
Some customers use DCB, a lot (most?) of them don't.  I don't think the
"this is a logical extension of a commonly used API" really stands here.
[HQN] DCBNL are being actively used. The whole point of this patch
is to tie buffer configuration with IEEE's priority and is IEEE's PFC configuration.

Ambitious future is to have the switch configure the NIC's buffer size and buffer mapping
via TLV packet and this DCBNL interface. But we won't go too far here.

The buffer configuration are not tied to port like switch.
It's tied to a port and TCs, you just have one port but still have 8
TCs exactly like a switch...
[HQN] No. Our buffer ties to priority not to TCs.
3. Shared buffer, alpha, threshold are switch specific terms.
IDK how talking about alpha is relevant, it's just one threshold type
the API supports.  As far as shared buffer and threshold I don't know
if these are switch terms (or how "switch" differs from "NIC" at that
level) - I personally find carving shared buffer into pools very
intuitive.
[HQN] Yes, I understand your point too. The NIC's buffer shares some characteristics with the switch's buffer settings. But this DCB buffer setting is to improve the performance and work together with the PFC setting. We would like to keep all the qos setting under DCB Netlink as they are designed
to be this way.


Could you give examples of commands/configs one can use with your new
ABI?
[HQN] The plan is to add the support in lldptool once the kernel code is accepted. To test the kernel code,
I am using small python scripts that works on top of the netlink library.
It will be like this format which is similar to other options in lldptool
    priority2buffer: 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to buffer 0,2,5,7,1,2,3,6     buffer_size: 87296,87296,0,87296,0,0,0,0 set receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
   How does one query the total size of the buffer to be carved?
[HQN] This is not necessary. If the total size is too big, error will be return via DCB netlink interface.


Reply via email to