Re: New NTB API Issue
On 23/06/17 04:06 PM, Allen Hubbe wrote: > From: Logan Gunthorpe >> But any translation can be >> programmed by any peer. > > That doesn't seem safe. Even though it can be done as you say, would it not > be better to have each specific translation under the control of exactly one > driver? > > If drivers can reach across and set the translation of any peer bar, they > would still need to negotiate among N peers which one sets which other's > translation. Yup. In a dual host setup its not a problem seeing only the one peer will ever set any of the memory windows. In the multi-host setup, there has to be implicit or explicit negotiation as to which memory window connects to which peer. The hardware also implements locking so if two peers try to mess with the same translation, the worst that can happen is the last peer's settings will take effect or one peer will see an error. Logan
Re: New NTB API Issue
On 23/06/17 04:06 PM, Allen Hubbe wrote: > From: Logan Gunthorpe >> But any translation can be >> programmed by any peer. > > That doesn't seem safe. Even though it can be done as you say, would it not > be better to have each specific translation under the control of exactly one > driver? > > If drivers can reach across and set the translation of any peer bar, they > would still need to negotiate among N peers which one sets which other's > translation. Yup. In a dual host setup its not a problem seeing only the one peer will ever set any of the memory windows. In the multi-host setup, there has to be implicit or explicit negotiation as to which memory window connects to which peer. The hardware also implements locking so if two peers try to mess with the same translation, the worst that can happen is the last peer's settings will take effect or one peer will see an error. Logan
RE: New NTB API Issue
From: Logan Gunthorpe > But any translation can be > programmed by any peer. That doesn't seem safe. Even though it can be done as you say, would it not be better to have each specific translation under the control of exactly one driver? If drivers can reach across and set the translation of any peer bar, they would still need to negotiate among N peers which one sets which other's translation.
RE: New NTB API Issue
From: Logan Gunthorpe > But any translation can be > programmed by any peer. That doesn't seem safe. Even though it can be done as you say, would it not be better to have each specific translation under the control of exactly one driver? If drivers can reach across and set the translation of any peer bar, they would still need to negotiate among N peers which one sets which other's translation.
RE: New NTB API Issue
From: Logan Gunthorpe > Hey, > > Thanks Serge for the detailed explanation. This is pretty much exactly > as I had thought it should be interpreted. My only problem remains that > my hardware can't provide ntb_mw_get_align until the port it is asking > about has configured itself. The easiest way to solve this is to only > allow access when the link to that port is up. It's not a complicated > change and would actually simplify ntb_transport and ntb_perf a little. > > Is this ok with you Allen? If it is, I can include a patch in my next > switchtec submission. Ok. > > Thanks, > > Logan
RE: New NTB API Issue
From: Logan Gunthorpe > Hey, > > Thanks Serge for the detailed explanation. This is pretty much exactly > as I had thought it should be interpreted. My only problem remains that > my hardware can't provide ntb_mw_get_align until the port it is asking > about has configured itself. The easiest way to solve this is to only > allow access when the link to that port is up. It's not a complicated > change and would actually simplify ntb_transport and ntb_perf a little. > > Is this ok with you Allen? If it is, I can include a patch in my next > switchtec submission. Ok. > > Thanks, > > Logan
Re: New NTB API Issue
Hey, Thanks Serge for the detailed explanation. This is pretty much exactly as I had thought it should be interpreted. My only problem remains that my hardware can't provide ntb_mw_get_align until the port it is asking about has configured itself. The easiest way to solve this is to only allow access when the link to that port is up. It's not a complicated change and would actually simplify ntb_transport and ntb_perf a little. Is this ok with you Allen? If it is, I can include a patch in my next switchtec submission. Thanks, Logan
Re: New NTB API Issue
Hey, Thanks Serge for the detailed explanation. This is pretty much exactly as I had thought it should be interpreted. My only problem remains that my hardware can't provide ntb_mw_get_align until the port it is asking about has configured itself. The easiest way to solve this is to only allow access when the link to that port is up. It's not a complicated change and would actually simplify ntb_transport and ntb_perf a little. Is this ok with you Allen? If it is, I can include a patch in my next switchtec submission. Thanks, Logan
Re: New NTB API Issue
On Fri, Jun 23, 2017 at 03:07:19PM -0400, Allen Hubbewrote: Hello Allen, > From: Logan Gunthorpe > > On 23/06/17 07:18 AM, Allen Hubbe wrote: > > > By "remote" do you mean the source or destination of a write? > > > > Look at how these calls are used in ntb_transport and ntb_perf: > > > > They both call ntb_peer_mw_get_addr to get the size of the BAR. The size > > is sent via spads to the other side. The other side then uses > > ntb_mw_get_align and applies align_size to the received size. > > > > > Yes, clients should transfer the address and size information to the peer. > > > > But then they also need to technically transfer the alignment > > information as well. Which neither of the clients do. > > The client's haven't been fully ported to the multi-port api yet. They were > only minimally changed to call the new api, but so far other than that they > have only been made to work as they had before. > > > > Maybe this is the confusion. None of these api calls are to reach across > > > to the peer port, as to > > get the size of the peer's bar. They are to get information from the local > > port, or to configure the > > local port. > > > > I like the rule that these api calls are not to reach across the port. > > But then API looks very wrong. Why are we calling one peer_mw_get addr > > and the other mw_get_align? And why does mw_get_align have a peer index? > > I regret that the term "peer" is used to distinguish the mw api. Better > names perhaps should be ntb_outbound_mw_foo, ntb_inbound_mw_foo; or > ntb_src_mw_foo, ntb_dest_mw_foo. I like outbound/inbound, although the names > are longer, maybe out/in would be ok. > As you remember we discussed the matter when new NTB API was being developed. So we decided to have the following designation: ntb_mw_* - NTB MW API connected with inbound memory windows configurations, ntb_peer_mw_* - NTB MW API connected with outbound memory windows configurations. Yes, suffixes like "in/out" might give better notion of the methods purpose, but the current notation is in coherency with the rest of API, and as long as user gets into NTB API, one won't have much difficulties with understanding. > > And why does mw_get_align have a peer index? > > Good question. Serge? > > For that matter, why do we not also have peer mw idx in the set of > parameters. Working through the example below, it looks like we are lacking > a way to say Outbound MW1 on A corresponds with Inbound MW0 on B. It looks > like we can only indicate that Port A (not which Outbound MW of Port A) > corresponds with Inbound MW0 on B. > Before I give any explanation, could you please study the new NTB API documentation: https://github.com/jonmason/ntb/blob/ntb-next/Documentation/ntb.txt Particularly you are interested in lines 31 - 111. It will give you a better description of the way all MW-related methods work for the start. The detailed multi-port example is given in the end of this email. > > > Some snippets of code would help me understand your interpretation of the > > > api semantics more > > exactly. > > > > I'm not sure the code to best show this in code, but let me try > > describing an example situation: > > > > Lets say we have the following mws on each side (this is something that > > is possible with Switchtec hardware): > > > > Host A BARs: > > mwA0: 2MB size, aligned to 4k, size aligned to 4k > > mwA1: 4MB size, aligned to 4k, size aligned to 4k > > mwA2: 64k size, aligned to 64k, size aligned to 64k > > > > Host B BARs: > > mwB0: 2MB size, aligned to 4k, size aligned to 4k > > mwB1: 64k size, aligned to 64k, size aligned to 64k > > If those are BARs, that corresponds to "outbound", writing something to the > BAR at mwA0. > > A more complete picture might be: > > Host A BARs (aka "outbound" or "peer" memory windows): > peer_mwA0: resource at 0xA - 0xA0020 (2MB) > peer_mwA1: resource at 0xA1000 - 0xA1040 (4MB) > peer_mwA2: resource at 0xA2000 - 0xa2001 (64k) > > Host A MWs (aka "inbound" memory windows): > mwA0: 64k max size, aligned to 64k, size aligned to 64k > mwA1: 2MB max size, aligned to 4k, size aligned to 4k > > Host A sees Host B on port index 1 > > > Host B BARs (aka "outbound" or "peer" memory windows): > peer_mwB0: resource at 0xB - 0xB0020 (2MB) > peer_mwB1: resource at 0xB1000 - 0xB1001 (64k) > > Host B MWs (aka "inbound" memory windows): > mwB0: 1MB size, aligned to 4k, size aligned to 4k > mwB1: 2MB size, aligned to 4k, size aligned to 4k > > Host B sees Host A on port index 4 > > > Outbound memory (aka "peer mw") windows come with a pci resource. We can get > the size of the resource, it's physical address, and set up outbound > translation if the hardware has that (IDT). > > Inbound memory windows (aka "mw") are only used to set up inbound > translation, if the hardware has that (Intel, AMD). > > To set up end-to-end memory window so that
Re: New NTB API Issue
On Fri, Jun 23, 2017 at 03:07:19PM -0400, Allen Hubbe wrote: Hello Allen, > From: Logan Gunthorpe > > On 23/06/17 07:18 AM, Allen Hubbe wrote: > > > By "remote" do you mean the source or destination of a write? > > > > Look at how these calls are used in ntb_transport and ntb_perf: > > > > They both call ntb_peer_mw_get_addr to get the size of the BAR. The size > > is sent via spads to the other side. The other side then uses > > ntb_mw_get_align and applies align_size to the received size. > > > > > Yes, clients should transfer the address and size information to the peer. > > > > But then they also need to technically transfer the alignment > > information as well. Which neither of the clients do. > > The client's haven't been fully ported to the multi-port api yet. They were > only minimally changed to call the new api, but so far other than that they > have only been made to work as they had before. > > > > Maybe this is the confusion. None of these api calls are to reach across > > > to the peer port, as to > > get the size of the peer's bar. They are to get information from the local > > port, or to configure the > > local port. > > > > I like the rule that these api calls are not to reach across the port. > > But then API looks very wrong. Why are we calling one peer_mw_get addr > > and the other mw_get_align? And why does mw_get_align have a peer index? > > I regret that the term "peer" is used to distinguish the mw api. Better > names perhaps should be ntb_outbound_mw_foo, ntb_inbound_mw_foo; or > ntb_src_mw_foo, ntb_dest_mw_foo. I like outbound/inbound, although the names > are longer, maybe out/in would be ok. > As you remember we discussed the matter when new NTB API was being developed. So we decided to have the following designation: ntb_mw_* - NTB MW API connected with inbound memory windows configurations, ntb_peer_mw_* - NTB MW API connected with outbound memory windows configurations. Yes, suffixes like "in/out" might give better notion of the methods purpose, but the current notation is in coherency with the rest of API, and as long as user gets into NTB API, one won't have much difficulties with understanding. > > And why does mw_get_align have a peer index? > > Good question. Serge? > > For that matter, why do we not also have peer mw idx in the set of > parameters. Working through the example below, it looks like we are lacking > a way to say Outbound MW1 on A corresponds with Inbound MW0 on B. It looks > like we can only indicate that Port A (not which Outbound MW of Port A) > corresponds with Inbound MW0 on B. > Before I give any explanation, could you please study the new NTB API documentation: https://github.com/jonmason/ntb/blob/ntb-next/Documentation/ntb.txt Particularly you are interested in lines 31 - 111. It will give you a better description of the way all MW-related methods work for the start. The detailed multi-port example is given in the end of this email. > > > Some snippets of code would help me understand your interpretation of the > > > api semantics more > > exactly. > > > > I'm not sure the code to best show this in code, but let me try > > describing an example situation: > > > > Lets say we have the following mws on each side (this is something that > > is possible with Switchtec hardware): > > > > Host A BARs: > > mwA0: 2MB size, aligned to 4k, size aligned to 4k > > mwA1: 4MB size, aligned to 4k, size aligned to 4k > > mwA2: 64k size, aligned to 64k, size aligned to 64k > > > > Host B BARs: > > mwB0: 2MB size, aligned to 4k, size aligned to 4k > > mwB1: 64k size, aligned to 64k, size aligned to 64k > > If those are BARs, that corresponds to "outbound", writing something to the > BAR at mwA0. > > A more complete picture might be: > > Host A BARs (aka "outbound" or "peer" memory windows): > peer_mwA0: resource at 0xA - 0xA0020 (2MB) > peer_mwA1: resource at 0xA1000 - 0xA1040 (4MB) > peer_mwA2: resource at 0xA2000 - 0xa2001 (64k) > > Host A MWs (aka "inbound" memory windows): > mwA0: 64k max size, aligned to 64k, size aligned to 64k > mwA1: 2MB max size, aligned to 4k, size aligned to 4k > > Host A sees Host B on port index 1 > > > Host B BARs (aka "outbound" or "peer" memory windows): > peer_mwB0: resource at 0xB - 0xB0020 (2MB) > peer_mwB1: resource at 0xB1000 - 0xB1001 (64k) > > Host B MWs (aka "inbound" memory windows): > mwB0: 1MB size, aligned to 4k, size aligned to 4k > mwB1: 2MB size, aligned to 4k, size aligned to 4k > > Host B sees Host A on port index 4 > > > Outbound memory (aka "peer mw") windows come with a pci resource. We can get > the size of the resource, it's physical address, and set up outbound > translation if the hardware has that (IDT). > > Inbound memory windows (aka "mw") are only used to set up inbound > translation, if the hardware has that (Intel, AMD). > > To set up end-to-end memory window so that A can write to B,
Re: New NTB API Issue
On 23/06/17 01:07 PM, Allen Hubbe wrote: > The client's haven't been fully ported to the multi-port api yet. They were > only minimally changed to call the new api, but so far other than that they > have only been made to work as they had before. So is it intended to eventually send the align parameters via spads? This seems like it would take a lot of spads or multiplexing the spads with a few doorbells. This gets a bit nasty. > If those are BARs, that corresponds to "outbound", writing something to the > BAR at mwA0. > A more complete picture might be: > > Host A BARs (aka "outbound" or "peer" memory windows): > peer_mwA0: resource at 0xA - 0xA0020 (2MB) > peer_mwA1: resource at 0xA1000 - 0xA1040 (4MB) > peer_mwA2: resource at 0xA2000 - 0xa2001 (64k) > > Host A MWs (aka "inbound" memory windows): > mwA0: 64k max size, aligned to 64k, size aligned to 64k > mwA1: 2MB max size, aligned to 4k, size aligned to 4k I don't really like the separation of inbound and output as you describe it. It doesn't really match my hardware. In switchtec, each partition has some number of BARs and each BAR has a single translation which sets the peer and destination address. The translation really exists inside the switch hardware, not on either side. But any translation can be programmed by any peer. Saying that there's an opposite inbound window to every outbound window is not an accurate abstraction for us. I _suspect_ the IDT hardware is similar but, based on Serge's driver, I think the translation can only be programmed by the peer that the BAR is resident in (as opposed to from any side like in the switchtec hardwer). (This poses some problems for getting the IDT code to actually work with existing clients.) > Outbound memory (aka "peer mw") windows come with a pci resource. We can get > the size of the resource, it's physical address, and set up outbound > translation if the hardware has that (IDT). > > Inbound memory windows (aka "mw") are only used to set up inbound > translation, if the hardware has that (Intel, AMD). > > To set up end-to-end memory window so that A can write to B, let's use > peer_mwA1 and mwB0. > > A: ntb_peer_mw_get_addr(peer_mwA1) -> base 0xA1000, size 4MB > B: ntb_mw_get_align(port4**, mwB0) -> aligned 4k, aligned 4k, max size 1MB > ** Serge: do we need port info here, why? > > Side A has a resource size of 4MB, but B only supports inbound translation up > to 1MB. Side A can only use the first quarter of the 4MB resource. > > Side B needs to allocate memory aligned to 4k (the dma address must be > aligned to 4k after dma mapping), and a multiple of 4k in size. B may need > to set inbound translation so that incoming writes go into this memory. A > may also need to set outbound translation. > > A: ntb_peer_mw_set_trans(port1**, peer_mwA1, dma_mem_addr, dma_mem_size) > B: ntb_mw_set_trans(port4**, mwB0, dma_mem_addr, dma_mem_size) > ** Serge: do we also need the opposing side MW index here? > > ** Logan: would those changes to the api suit your needs? Not really, no. Except for the confusion with the mw_get_align issue the new API, as it is, suits my hardware well. What you're proposing doesn't fix my issue and doesn't match my hardware. Though, I interpreted ntb_peer_mw_set_trans somewhat differently from what you describe. I did not expect the client would need to call both functions but some clients could optionally use ntb_peer_mw_set_trans to set the translation from the opposite side (thus needing to send the DMA address over spads or msgs). Though, without an actual in-kernel user it's hard to know what is actually intended. It's worth noticing that the IDT driver only provides peer_mw_set_trans and not mw_set_trans. I assumed it's because the hardware's memory windows can only be configured from the opposite side. Pragmatically, the only change I need for everything to work as I expect is for mw_get_align to be called only after link up. However, given all the confusion I'm wondering if these changes are even ready for upstream. Without actual in-kernel client code it's hard to know if the API is correct or that everyone is even interpreting it in the same way. Logan
Re: New NTB API Issue
On 23/06/17 01:07 PM, Allen Hubbe wrote: > The client's haven't been fully ported to the multi-port api yet. They were > only minimally changed to call the new api, but so far other than that they > have only been made to work as they had before. So is it intended to eventually send the align parameters via spads? This seems like it would take a lot of spads or multiplexing the spads with a few doorbells. This gets a bit nasty. > If those are BARs, that corresponds to "outbound", writing something to the > BAR at mwA0. > A more complete picture might be: > > Host A BARs (aka "outbound" or "peer" memory windows): > peer_mwA0: resource at 0xA - 0xA0020 (2MB) > peer_mwA1: resource at 0xA1000 - 0xA1040 (4MB) > peer_mwA2: resource at 0xA2000 - 0xa2001 (64k) > > Host A MWs (aka "inbound" memory windows): > mwA0: 64k max size, aligned to 64k, size aligned to 64k > mwA1: 2MB max size, aligned to 4k, size aligned to 4k I don't really like the separation of inbound and output as you describe it. It doesn't really match my hardware. In switchtec, each partition has some number of BARs and each BAR has a single translation which sets the peer and destination address. The translation really exists inside the switch hardware, not on either side. But any translation can be programmed by any peer. Saying that there's an opposite inbound window to every outbound window is not an accurate abstraction for us. I _suspect_ the IDT hardware is similar but, based on Serge's driver, I think the translation can only be programmed by the peer that the BAR is resident in (as opposed to from any side like in the switchtec hardwer). (This poses some problems for getting the IDT code to actually work with existing clients.) > Outbound memory (aka "peer mw") windows come with a pci resource. We can get > the size of the resource, it's physical address, and set up outbound > translation if the hardware has that (IDT). > > Inbound memory windows (aka "mw") are only used to set up inbound > translation, if the hardware has that (Intel, AMD). > > To set up end-to-end memory window so that A can write to B, let's use > peer_mwA1 and mwB0. > > A: ntb_peer_mw_get_addr(peer_mwA1) -> base 0xA1000, size 4MB > B: ntb_mw_get_align(port4**, mwB0) -> aligned 4k, aligned 4k, max size 1MB > ** Serge: do we need port info here, why? > > Side A has a resource size of 4MB, but B only supports inbound translation up > to 1MB. Side A can only use the first quarter of the 4MB resource. > > Side B needs to allocate memory aligned to 4k (the dma address must be > aligned to 4k after dma mapping), and a multiple of 4k in size. B may need > to set inbound translation so that incoming writes go into this memory. A > may also need to set outbound translation. > > A: ntb_peer_mw_set_trans(port1**, peer_mwA1, dma_mem_addr, dma_mem_size) > B: ntb_mw_set_trans(port4**, mwB0, dma_mem_addr, dma_mem_size) > ** Serge: do we also need the opposing side MW index here? > > ** Logan: would those changes to the api suit your needs? Not really, no. Except for the confusion with the mw_get_align issue the new API, as it is, suits my hardware well. What you're proposing doesn't fix my issue and doesn't match my hardware. Though, I interpreted ntb_peer_mw_set_trans somewhat differently from what you describe. I did not expect the client would need to call both functions but some clients could optionally use ntb_peer_mw_set_trans to set the translation from the opposite side (thus needing to send the DMA address over spads or msgs). Though, without an actual in-kernel user it's hard to know what is actually intended. It's worth noticing that the IDT driver only provides peer_mw_set_trans and not mw_set_trans. I assumed it's because the hardware's memory windows can only be configured from the opposite side. Pragmatically, the only change I need for everything to work as I expect is for mw_get_align to be called only after link up. However, given all the confusion I'm wondering if these changes are even ready for upstream. Without actual in-kernel client code it's hard to know if the API is correct or that everyone is even interpreting it in the same way. Logan
RE: New NTB API Issue
From: Logan Gunthorpe > On 23/06/17 07:18 AM, Allen Hubbe wrote: > > By "remote" do you mean the source or destination of a write? > > Look at how these calls are used in ntb_transport and ntb_perf: > > They both call ntb_peer_mw_get_addr to get the size of the BAR. The size > is sent via spads to the other side. The other side then uses > ntb_mw_get_align and applies align_size to the received size. > > > Yes, clients should transfer the address and size information to the peer. > > But then they also need to technically transfer the alignment > information as well. Which neither of the clients do. The client's haven't been fully ported to the multi-port api yet. They were only minimally changed to call the new api, but so far other than that they have only been made to work as they had before. > > Maybe this is the confusion. None of these api calls are to reach across > > to the peer port, as to > get the size of the peer's bar. They are to get information from the local > port, or to configure the > local port. > > I like the rule that these api calls are not to reach across the port. > But then API looks very wrong. Why are we calling one peer_mw_get addr > and the other mw_get_align? And why does mw_get_align have a peer index? I regret that the term "peer" is used to distinguish the mw api. Better names perhaps should be ntb_outbound_mw_foo, ntb_inbound_mw_foo; or ntb_src_mw_foo, ntb_dest_mw_foo. I like outbound/inbound, although the names are longer, maybe out/in would be ok. > And why does mw_get_align have a peer index? Good question. Serge? For that matter, why do we not also have peer mw idx in the set of parameters. Working through the example below, it looks like we are lacking a way to say Outbound MW1 on A corresponds with Inbound MW0 on B. It looks like we can only indicate that Port A (not which Outbound MW of Port A) corresponds with Inbound MW0 on B. > > Some snippets of code would help me understand your interpretation of the > > api semantics more > exactly. > > I'm not sure the code to best show this in code, but let me try > describing an example situation: > > Lets say we have the following mws on each side (this is something that > is possible with Switchtec hardware): > > Host A BARs: > mwA0: 2MB size, aligned to 4k, size aligned to 4k > mwA1: 4MB size, aligned to 4k, size aligned to 4k > mwA2: 64k size, aligned to 64k, size aligned to 64k > > Host B BARs: > mwB0: 2MB size, aligned to 4k, size aligned to 4k > mwB1: 64k size, aligned to 64k, size aligned to 64k If those are BARs, that corresponds to "outbound", writing something to the BAR at mwA0. A more complete picture might be: Host A BARs (aka "outbound" or "peer" memory windows): peer_mwA0: resource at 0xA - 0xA0020 (2MB) peer_mwA1: resource at 0xA1000 - 0xA1040 (4MB) peer_mwA2: resource at 0xA2000 - 0xa2001 (64k) Host A MWs (aka "inbound" memory windows): mwA0: 64k max size, aligned to 64k, size aligned to 64k mwA1: 2MB max size, aligned to 4k, size aligned to 4k Host A sees Host B on port index 1 Host B BARs (aka "outbound" or "peer" memory windows): peer_mwB0: resource at 0xB - 0xB0020 (2MB) peer_mwB1: resource at 0xB1000 - 0xB1001 (64k) Host B MWs (aka "inbound" memory windows): mwB0: 1MB size, aligned to 4k, size aligned to 4k mwB1: 2MB size, aligned to 4k, size aligned to 4k Host B sees Host A on port index 4 Outbound memory (aka "peer mw") windows come with a pci resource. We can get the size of the resource, it's physical address, and set up outbound translation if the hardware has that (IDT). Inbound memory windows (aka "mw") are only used to set up inbound translation, if the hardware has that (Intel, AMD). To set up end-to-end memory window so that A can write to B, let's use peer_mwA1 and mwB0. A: ntb_peer_mw_get_addr(peer_mwA1) -> base 0xA1000, size 4MB B: ntb_mw_get_align(port4**, mwB0) -> aligned 4k, aligned 4k, max size 1MB ** Serge: do we need port info here, why? Side A has a resource size of 4MB, but B only supports inbound translation up to 1MB. Side A can only use the first quarter of the 4MB resource. Side B needs to allocate memory aligned to 4k (the dma address must be aligned to 4k after dma mapping), and a multiple of 4k in size. B may need to set inbound translation so that incoming writes go into this memory. A may also need to set outbound translation. A: ntb_peer_mw_set_trans(port1**, peer_mwA1, dma_mem_addr, dma_mem_size) B: ntb_mw_set_trans(port4**, mwB0, dma_mem_addr, dma_mem_size) ** Serge: do we also need the opposing side MW index here? ** Logan: would those changes to the api suit your needs? > So on Host A: peer_mw_get_addr(idx=1) should return size=4M (from mwA1), > but it's not clear what mw_get_align(widx=1) should do. I see two > possibilities: > > 1) Given that it has the opposite sense of peer_mw_get_addr (ie. there > is no peer in it's name) and that this new
RE: New NTB API Issue
From: Logan Gunthorpe > On 23/06/17 07:18 AM, Allen Hubbe wrote: > > By "remote" do you mean the source or destination of a write? > > Look at how these calls are used in ntb_transport and ntb_perf: > > They both call ntb_peer_mw_get_addr to get the size of the BAR. The size > is sent via spads to the other side. The other side then uses > ntb_mw_get_align and applies align_size to the received size. > > > Yes, clients should transfer the address and size information to the peer. > > But then they also need to technically transfer the alignment > information as well. Which neither of the clients do. The client's haven't been fully ported to the multi-port api yet. They were only minimally changed to call the new api, but so far other than that they have only been made to work as they had before. > > Maybe this is the confusion. None of these api calls are to reach across > > to the peer port, as to > get the size of the peer's bar. They are to get information from the local > port, or to configure the > local port. > > I like the rule that these api calls are not to reach across the port. > But then API looks very wrong. Why are we calling one peer_mw_get addr > and the other mw_get_align? And why does mw_get_align have a peer index? I regret that the term "peer" is used to distinguish the mw api. Better names perhaps should be ntb_outbound_mw_foo, ntb_inbound_mw_foo; or ntb_src_mw_foo, ntb_dest_mw_foo. I like outbound/inbound, although the names are longer, maybe out/in would be ok. > And why does mw_get_align have a peer index? Good question. Serge? For that matter, why do we not also have peer mw idx in the set of parameters. Working through the example below, it looks like we are lacking a way to say Outbound MW1 on A corresponds with Inbound MW0 on B. It looks like we can only indicate that Port A (not which Outbound MW of Port A) corresponds with Inbound MW0 on B. > > Some snippets of code would help me understand your interpretation of the > > api semantics more > exactly. > > I'm not sure the code to best show this in code, but let me try > describing an example situation: > > Lets say we have the following mws on each side (this is something that > is possible with Switchtec hardware): > > Host A BARs: > mwA0: 2MB size, aligned to 4k, size aligned to 4k > mwA1: 4MB size, aligned to 4k, size aligned to 4k > mwA2: 64k size, aligned to 64k, size aligned to 64k > > Host B BARs: > mwB0: 2MB size, aligned to 4k, size aligned to 4k > mwB1: 64k size, aligned to 64k, size aligned to 64k If those are BARs, that corresponds to "outbound", writing something to the BAR at mwA0. A more complete picture might be: Host A BARs (aka "outbound" or "peer" memory windows): peer_mwA0: resource at 0xA - 0xA0020 (2MB) peer_mwA1: resource at 0xA1000 - 0xA1040 (4MB) peer_mwA2: resource at 0xA2000 - 0xa2001 (64k) Host A MWs (aka "inbound" memory windows): mwA0: 64k max size, aligned to 64k, size aligned to 64k mwA1: 2MB max size, aligned to 4k, size aligned to 4k Host A sees Host B on port index 1 Host B BARs (aka "outbound" or "peer" memory windows): peer_mwB0: resource at 0xB - 0xB0020 (2MB) peer_mwB1: resource at 0xB1000 - 0xB1001 (64k) Host B MWs (aka "inbound" memory windows): mwB0: 1MB size, aligned to 4k, size aligned to 4k mwB1: 2MB size, aligned to 4k, size aligned to 4k Host B sees Host A on port index 4 Outbound memory (aka "peer mw") windows come with a pci resource. We can get the size of the resource, it's physical address, and set up outbound translation if the hardware has that (IDT). Inbound memory windows (aka "mw") are only used to set up inbound translation, if the hardware has that (Intel, AMD). To set up end-to-end memory window so that A can write to B, let's use peer_mwA1 and mwB0. A: ntb_peer_mw_get_addr(peer_mwA1) -> base 0xA1000, size 4MB B: ntb_mw_get_align(port4**, mwB0) -> aligned 4k, aligned 4k, max size 1MB ** Serge: do we need port info here, why? Side A has a resource size of 4MB, but B only supports inbound translation up to 1MB. Side A can only use the first quarter of the 4MB resource. Side B needs to allocate memory aligned to 4k (the dma address must be aligned to 4k after dma mapping), and a multiple of 4k in size. B may need to set inbound translation so that incoming writes go into this memory. A may also need to set outbound translation. A: ntb_peer_mw_set_trans(port1**, peer_mwA1, dma_mem_addr, dma_mem_size) B: ntb_mw_set_trans(port4**, mwB0, dma_mem_addr, dma_mem_size) ** Serge: do we also need the opposing side MW index here? ** Logan: would those changes to the api suit your needs? > So on Host A: peer_mw_get_addr(idx=1) should return size=4M (from mwA1), > but it's not clear what mw_get_align(widx=1) should do. I see two > possibilities: > > 1) Given that it has the opposite sense of peer_mw_get_addr (ie. there > is no peer in it's name) and that this new
Re: New NTB API Issue
On 23/06/17 07:18 AM, Allen Hubbe wrote: > By "remote" do you mean the source or destination of a write? Look at how these calls are used in ntb_transport and ntb_perf: They both call ntb_peer_mw_get_addr to get the size of the BAR. The size is sent via spads to the other side. The other side then uses ntb_mw_get_align and applies align_size to the received size. > Yes, clients should transfer the address and size information to the peer. But then they also need to technically transfer the alignment information as well. Which neither of the clients do. > Maybe this is the confusion. None of these api calls are to reach across to > the peer port, as to get the size of the peer's bar. They are to get > information from the local port, or to configure the local port. I like the rule that these api calls are not to reach across the port. But then API looks very wrong. Why are we calling one peer_mw_get addr and the other mw_get_align? And why does mw_get_align have a peer index? > Some snippets of code would help me understand your interpretation of the api > semantics more exactly. I'm not sure the code to best show this in code, but let me try describing an example situation: Lets say we have the following mws on each side (this is something that is possible with Switchtec hardware): Host A BARs: mwA0: 2MB size, aligned to 4k, size aligned to 4k mwA1: 4MB size, aligned to 4k, size aligned to 4k mwA2: 64k size, aligned to 64k, size aligned to 64k Host B BARs: mwB0: 2MB size, aligned to 4k, size aligned to 4k mwB1: 64k size, aligned to 64k, size aligned to 64k So on Host A: peer_mw_get_addr(idx=1) should return size=4M (from mwA1), but it's not clear what mw_get_align(widx=1) should do. I see two possibilities: 1) Given that it has the opposite sense of peer_mw_get_addr (ie. there is no peer in it's name) and that this new api also has a peer index, it should return align_size=64k (from mwB1). However, in order to do this, Host B must be fully configured (technically the link doesn't have to be fully up, but having a link up is the only way for a client to tell if Host B is configured or not). 2) Given your assertion that these APIs should never reach across the link, then one could say it should return align_size=4k. However, in this situation the name is really confusing. And the fact that it has a pidx argument makes no sense. And the way ntb_transport and ntb_perf use it is wrong because they will end up masking the 64K size of mwB1 with the 4k align_size from mwA1. Does that make more sense? Thanks, Logan
Re: New NTB API Issue
On 23/06/17 07:18 AM, Allen Hubbe wrote: > By "remote" do you mean the source or destination of a write? Look at how these calls are used in ntb_transport and ntb_perf: They both call ntb_peer_mw_get_addr to get the size of the BAR. The size is sent via spads to the other side. The other side then uses ntb_mw_get_align and applies align_size to the received size. > Yes, clients should transfer the address and size information to the peer. But then they also need to technically transfer the alignment information as well. Which neither of the clients do. > Maybe this is the confusion. None of these api calls are to reach across to > the peer port, as to get the size of the peer's bar. They are to get > information from the local port, or to configure the local port. I like the rule that these api calls are not to reach across the port. But then API looks very wrong. Why are we calling one peer_mw_get addr and the other mw_get_align? And why does mw_get_align have a peer index? > Some snippets of code would help me understand your interpretation of the api > semantics more exactly. I'm not sure the code to best show this in code, but let me try describing an example situation: Lets say we have the following mws on each side (this is something that is possible with Switchtec hardware): Host A BARs: mwA0: 2MB size, aligned to 4k, size aligned to 4k mwA1: 4MB size, aligned to 4k, size aligned to 4k mwA2: 64k size, aligned to 64k, size aligned to 64k Host B BARs: mwB0: 2MB size, aligned to 4k, size aligned to 4k mwB1: 64k size, aligned to 64k, size aligned to 64k So on Host A: peer_mw_get_addr(idx=1) should return size=4M (from mwA1), but it's not clear what mw_get_align(widx=1) should do. I see two possibilities: 1) Given that it has the opposite sense of peer_mw_get_addr (ie. there is no peer in it's name) and that this new api also has a peer index, it should return align_size=64k (from mwB1). However, in order to do this, Host B must be fully configured (technically the link doesn't have to be fully up, but having a link up is the only way for a client to tell if Host B is configured or not). 2) Given your assertion that these APIs should never reach across the link, then one could say it should return align_size=4k. However, in this situation the name is really confusing. And the fact that it has a pidx argument makes no sense. And the way ntb_transport and ntb_perf use it is wrong because they will end up masking the 64K size of mwB1 with the 4k align_size from mwA1. Does that make more sense? Thanks, Logan
RE: New NTB API Issue
From: Logan Gunthorpe > On 6/22/2017 4:42 PM, Allen Hubbe wrote: > > From: Logan Gunthorpe > >> Any thoughts on changing the semantics of mw_get_align so it must be > >> called with the link up? > > > > The intention of these is that these calls return information from the > > local port. The calls > themselves don't reach across the link to the peer, but the information > returned from the local port > needs to be communicated for setting up the translation end-to-end. > > Ok, well if it's from the local port, then splitting up mw_get_range > into peer_mw_get_addr and mw_get_align is confusing because one has the > peer designation and the other doesn't. And all the clients apply the > alignments to the remote bar so they'd technically need to transfer them > across the link somehow. By "remote" do you mean the source or destination of a write? Yes, clients should transfer the address and size information to the peer. > > I would like to understand why this hardware needs link up. Are there > > registers on the local port > that are only valid after link up? > > We only need the link up if we are trying to find the alignment > requirements (and max_size) of the peer's bar. In theory, switchtec Maybe this is the confusion. None of these api calls are to reach across to the peer port, as to get the size of the peer's bar. They are to get information from the local port, or to configure the local port. Some mw configuration is done at the destination, as for Intel and AMD, and some configuration is done on the source side, for IDT. The local configuration of the port on one side could depend on information from the remote port on the other side. For example in IDT, the mw translation configured on the source side needs to know destination address information. Likewise, if there is any difference in the size of the range that can be translated by ports on opposing sides, that needs to be negotiated. > could have different sizes of bars on both sides of the link and > different alignment requirements. Though, in practice, this is probably > unlikely. > > > Can you post snippets of how ntb_mw_get_align and ntb_peer_mw_get_addr > > might be implemented for the > Switchtec? > > Hmm, yes, but lets sort out my confusion on the semantics per above first. Some snippets of code would help me understand your interpretation of the api semantics more exactly. > Logan
RE: New NTB API Issue
From: Logan Gunthorpe > On 6/22/2017 4:42 PM, Allen Hubbe wrote: > > From: Logan Gunthorpe > >> Any thoughts on changing the semantics of mw_get_align so it must be > >> called with the link up? > > > > The intention of these is that these calls return information from the > > local port. The calls > themselves don't reach across the link to the peer, but the information > returned from the local port > needs to be communicated for setting up the translation end-to-end. > > Ok, well if it's from the local port, then splitting up mw_get_range > into peer_mw_get_addr and mw_get_align is confusing because one has the > peer designation and the other doesn't. And all the clients apply the > alignments to the remote bar so they'd technically need to transfer them > across the link somehow. By "remote" do you mean the source or destination of a write? Yes, clients should transfer the address and size information to the peer. > > I would like to understand why this hardware needs link up. Are there > > registers on the local port > that are only valid after link up? > > We only need the link up if we are trying to find the alignment > requirements (and max_size) of the peer's bar. In theory, switchtec Maybe this is the confusion. None of these api calls are to reach across to the peer port, as to get the size of the peer's bar. They are to get information from the local port, or to configure the local port. Some mw configuration is done at the destination, as for Intel and AMD, and some configuration is done on the source side, for IDT. The local configuration of the port on one side could depend on information from the remote port on the other side. For example in IDT, the mw translation configured on the source side needs to know destination address information. Likewise, if there is any difference in the size of the range that can be translated by ports on opposing sides, that needs to be negotiated. > could have different sizes of bars on both sides of the link and > different alignment requirements. Though, in practice, this is probably > unlikely. > > > Can you post snippets of how ntb_mw_get_align and ntb_peer_mw_get_addr > > might be implemented for the > Switchtec? > > Hmm, yes, but lets sort out my confusion on the semantics per above first. Some snippets of code would help me understand your interpretation of the api semantics more exactly. > Logan
Re: New NTB API Issue
On 6/22/2017 4:42 PM, Allen Hubbe wrote: From: Logan Gunthorpe Any thoughts on changing the semantics of mw_get_align so it must be called with the link up? The intention of these is that these calls return information from the local port. The calls themselves don't reach across the link to the peer, but the information returned from the local port needs to be communicated for setting up the translation end-to-end. Ok, well if it's from the local port, then splitting up mw_get_range into peer_mw_get_addr and mw_get_align is confusing because one has the peer designation and the other doesn't. And all the clients apply the alignments to the remote bar so they'd technically need to transfer them across the link somehow. I would like to understand why this hardware needs link up. Are there registers on the local port that are only valid after link up? We only need the link up if we are trying to find the alignment requirements (and max_size) of the peer's bar. In theory, switchtec could have different sizes of bars on both sides of the link and different alignment requirements. Though, in practice, this is probably unlikely. Can you post snippets of how ntb_mw_get_align and ntb_peer_mw_get_addr might be implemented for the Switchtec? Hmm, yes, but lets sort out my confusion on the semantics per above first. Logan
Re: New NTB API Issue
On 6/22/2017 4:42 PM, Allen Hubbe wrote: From: Logan Gunthorpe Any thoughts on changing the semantics of mw_get_align so it must be called with the link up? The intention of these is that these calls return information from the local port. The calls themselves don't reach across the link to the peer, but the information returned from the local port needs to be communicated for setting up the translation end-to-end. Ok, well if it's from the local port, then splitting up mw_get_range into peer_mw_get_addr and mw_get_align is confusing because one has the peer designation and the other doesn't. And all the clients apply the alignments to the remote bar so they'd technically need to transfer them across the link somehow. I would like to understand why this hardware needs link up. Are there registers on the local port that are only valid after link up? We only need the link up if we are trying to find the alignment requirements (and max_size) of the peer's bar. In theory, switchtec could have different sizes of bars on both sides of the link and different alignment requirements. Though, in practice, this is probably unlikely. Can you post snippets of how ntb_mw_get_align and ntb_peer_mw_get_addr might be implemented for the Switchtec? Hmm, yes, but lets sort out my confusion on the semantics per above first. Logan
RE: New NTB API Issue
From: Logan Gunthorpe > Any thoughts on changing the semantics of mw_get_align so it must be > called with the link up? The intention of these is that these calls return information from the local port. The calls themselves don't reach across the link to the peer, but the information returned from the local port needs to be communicated for setting up the translation end-to-end. I would like to understand why this hardware needs link up. Are there registers on the local port that are only valid after link up? Can you post snippets of how ntb_mw_get_align and ntb_peer_mw_get_addr might be implemented for the Switchtec?
RE: New NTB API Issue
From: Logan Gunthorpe > Any thoughts on changing the semantics of mw_get_align so it must be > called with the link up? The intention of these is that these calls return information from the local port. The calls themselves don't reach across the link to the peer, but the information returned from the local port needs to be communicated for setting up the translation end-to-end. I would like to understand why this hardware needs link up. Are there registers on the local port that are only valid after link up? Can you post snippets of how ntb_mw_get_align and ntb_peer_mw_get_addr might be implemented for the Switchtec?
Re: New NTB API Issue
On 6/22/2017 4:12 PM, Allen Hubbe wrote: The resource size given by peer_mw_get_addr might be different than the max_size given by ntb_mw_get_align. I am most familiar with the ntb_hw_intel driver and that type of ntb hardware. The peer_mw_get_addr size is of the primary bar on the side to be the source of the translated writes (or reads). In b2b topology, at least, the first translation of that write lands it on the secondary bar of the peer ntb. That size of that bar could different than the first. The second translation lands the write in memory (eg). So, the end-to-end translation is limited by the first AND second sizes. The first point is, the *max_size returned by intel_ntb_mw_get_align looks wrong. That should be the size of the secondary bar, not the resource size of the primary bar, of that device. The second point is, because the sizes returned by peer_mw_get_addr, and ntb_mw_get_align, may be different, the two sides should communicate and reconcile the address and size information when setting up the translations. Ok, that makes some sense. So drivers that don't need this should return -1 or something like that? Though, I'd still suggest that until a driver actually needs this, and it's implemented correctly in the clients, we should leave it out. Any thoughts on changing the semantics of mw_get_align so it must be called with the link up? Logan
Re: New NTB API Issue
On 6/22/2017 4:12 PM, Allen Hubbe wrote: The resource size given by peer_mw_get_addr might be different than the max_size given by ntb_mw_get_align. I am most familiar with the ntb_hw_intel driver and that type of ntb hardware. The peer_mw_get_addr size is of the primary bar on the side to be the source of the translated writes (or reads). In b2b topology, at least, the first translation of that write lands it on the secondary bar of the peer ntb. That size of that bar could different than the first. The second translation lands the write in memory (eg). So, the end-to-end translation is limited by the first AND second sizes. The first point is, the *max_size returned by intel_ntb_mw_get_align looks wrong. That should be the size of the secondary bar, not the resource size of the primary bar, of that device. The second point is, because the sizes returned by peer_mw_get_addr, and ntb_mw_get_align, may be different, the two sides should communicate and reconcile the address and size information when setting up the translations. Ok, that makes some sense. So drivers that don't need this should return -1 or something like that? Though, I'd still suggest that until a driver actually needs this, and it's implemented correctly in the clients, we should leave it out. Any thoughts on changing the semantics of mw_get_align so it must be called with the link up? Logan
RE: New NTB API Issue
From: Logan Gunthorpe > On 6/22/2017 12:32 PM, Allen Hubbe wrote: > > From: Logan Gunthorpe > >> 2) The changes to the Intel and AMD driver for mw_get_align sets > >> *max_size to the local pci resource size. (Thus making the assumption > >> that the local is the same as the peer, which is wrong). max_size isn't > >> actually used for anything so it's not _really_ an issue, but I do think > >> it's confusing and incorrect. I'd suggest we remove max_size until > >> something actually needs it, or at least set it to zero in cases where > >> the hardware doesn't support returning the size of the peer's memory > >> window (ie. in the Intel and AMD drivers). > > > > You're right, and the b2b_split in the Intel driver even makes use of > > different primary/secondary > bar sizes. For Intel and AMD, it would make more sense to use the secondary > bar size here. The size > of the secondary bar still not necessarily valid end-to-end, because in b2b > the peer's primary bar > size could be even smaller. > > > > I'm not entirely convinced that this should represent the end-to-end size > > of local and peer memory > window configurations. I think it should represent the largest side that > would be valid to pass to > ntb_mw_set_trans(). Then, the peers should communicate their respective max > sizes (along with > translation addresses, etc) before setting up the translations, and that > exchange will ensure that the > size finally used is valid end-to-end. > > But why would the client ever need to use the max_size instead of the > actual size of the bar as retrieved and exchanged from peer_mw_get_addr? The resource size given by peer_mw_get_addr might be different than the max_size given by ntb_mw_get_align. I am most familiar with the ntb_hw_intel driver and that type of ntb hardware. The peer_mw_get_addr size is of the primary bar on the side to be the source of the translated writes (or reads). In b2b topology, at least, the first translation of that write lands it on the secondary bar of the peer ntb. That size of that bar could different than the first. The second translation lands the write in memory (eg). So, the end-to-end translation is limited by the first AND second sizes. The first point is, the *max_size returned by intel_ntb_mw_get_align looks wrong. That should be the size of the secondary bar, not the resource size of the primary bar, of that device. The second point is, because the sizes returned by peer_mw_get_addr, and ntb_mw_get_align, may be different, the two sides should communicate and reconcile the address and size information when setting up the translations.
RE: New NTB API Issue
From: Logan Gunthorpe > On 6/22/2017 12:32 PM, Allen Hubbe wrote: > > From: Logan Gunthorpe > >> 2) The changes to the Intel and AMD driver for mw_get_align sets > >> *max_size to the local pci resource size. (Thus making the assumption > >> that the local is the same as the peer, which is wrong). max_size isn't > >> actually used for anything so it's not _really_ an issue, but I do think > >> it's confusing and incorrect. I'd suggest we remove max_size until > >> something actually needs it, or at least set it to zero in cases where > >> the hardware doesn't support returning the size of the peer's memory > >> window (ie. in the Intel and AMD drivers). > > > > You're right, and the b2b_split in the Intel driver even makes use of > > different primary/secondary > bar sizes. For Intel and AMD, it would make more sense to use the secondary > bar size here. The size > of the secondary bar still not necessarily valid end-to-end, because in b2b > the peer's primary bar > size could be even smaller. > > > > I'm not entirely convinced that this should represent the end-to-end size > > of local and peer memory > window configurations. I think it should represent the largest side that > would be valid to pass to > ntb_mw_set_trans(). Then, the peers should communicate their respective max > sizes (along with > translation addresses, etc) before setting up the translations, and that > exchange will ensure that the > size finally used is valid end-to-end. > > But why would the client ever need to use the max_size instead of the > actual size of the bar as retrieved and exchanged from peer_mw_get_addr? The resource size given by peer_mw_get_addr might be different than the max_size given by ntb_mw_get_align. I am most familiar with the ntb_hw_intel driver and that type of ntb hardware. The peer_mw_get_addr size is of the primary bar on the side to be the source of the translated writes (or reads). In b2b topology, at least, the first translation of that write lands it on the secondary bar of the peer ntb. That size of that bar could different than the first. The second translation lands the write in memory (eg). So, the end-to-end translation is limited by the first AND second sizes. The first point is, the *max_size returned by intel_ntb_mw_get_align looks wrong. That should be the size of the secondary bar, not the resource size of the primary bar, of that device. The second point is, because the sizes returned by peer_mw_get_addr, and ntb_mw_get_align, may be different, the two sides should communicate and reconcile the address and size information when setting up the translations.
RE: New NTB API Issue
From: Logan Gunthorpe > Hey Guys, > > I've run into some subtle issues with the new API: > > It has to do with splitting mw_get_range into mw_get_align and > peer_mw_get_addr. > > The original mw_get_range returned the size of the /local/ memory > window's size, address and alignment requirements. The ntb clients then > take the local size and transmit it via spads to the peer which would > use it in setting up the memory window. However, it made the assumption > that the alignment restrictions were symmetric on both hosts seeing they > were not sent across the link. > > The new API makes a sensible change for this in that mw_get_align > appears to be intended to return the alignment restrictions (and now > size) of the peer. This helps a bit for the Switchtec driver but appears > to be a semantic change that wasn't really reflected in the changes to > the other NTB code. So, I see a couple of issues: > > 1) With our hardware, we can't actually know anything about the peer's > memory windows until the peer has finished its setup (ie. the link is > up). However, all the clients call the function during probe, before the > link is ready. There's really no good reason for this, so I think we > should change the clients so that mw_get_align is called only when the > link is up. > > 2) The changes to the Intel and AMD driver for mw_get_align sets > *max_size to the local pci resource size. (Thus making the assumption > that the local is the same as the peer, which is wrong). max_size isn't > actually used for anything so it's not _really_ an issue, but I do think > it's confusing and incorrect. I'd suggest we remove max_size until > something actually needs it, or at least set it to zero in cases where > the hardware doesn't support returning the size of the peer's memory > window (ie. in the Intel and AMD drivers). You're right, and the b2b_split in the Intel driver even makes use of different primary/secondary bar sizes. For Intel and AMD, it would make more sense to use the secondary bar size here. The size of the secondary bar still not necessarily valid end-to-end, because in b2b the peer's primary bar size could be even smaller. I'm not entirely convinced that this should represent the end-to-end size of local and peer memory window configurations. I think it should represent the largest side that would be valid to pass to ntb_mw_set_trans(). Then, the peers should communicate their respective max sizes (along with translation addresses, etc) before setting up the translations, and that exchange will ensure that the size finally used is valid end-to-end. > > Thoughts? > > Logan
RE: New NTB API Issue
From: Logan Gunthorpe > Hey Guys, > > I've run into some subtle issues with the new API: > > It has to do with splitting mw_get_range into mw_get_align and > peer_mw_get_addr. > > The original mw_get_range returned the size of the /local/ memory > window's size, address and alignment requirements. The ntb clients then > take the local size and transmit it via spads to the peer which would > use it in setting up the memory window. However, it made the assumption > that the alignment restrictions were symmetric on both hosts seeing they > were not sent across the link. > > The new API makes a sensible change for this in that mw_get_align > appears to be intended to return the alignment restrictions (and now > size) of the peer. This helps a bit for the Switchtec driver but appears > to be a semantic change that wasn't really reflected in the changes to > the other NTB code. So, I see a couple of issues: > > 1) With our hardware, we can't actually know anything about the peer's > memory windows until the peer has finished its setup (ie. the link is > up). However, all the clients call the function during probe, before the > link is ready. There's really no good reason for this, so I think we > should change the clients so that mw_get_align is called only when the > link is up. > > 2) The changes to the Intel and AMD driver for mw_get_align sets > *max_size to the local pci resource size. (Thus making the assumption > that the local is the same as the peer, which is wrong). max_size isn't > actually used for anything so it's not _really_ an issue, but I do think > it's confusing and incorrect. I'd suggest we remove max_size until > something actually needs it, or at least set it to zero in cases where > the hardware doesn't support returning the size of the peer's memory > window (ie. in the Intel and AMD drivers). You're right, and the b2b_split in the Intel driver even makes use of different primary/secondary bar sizes. For Intel and AMD, it would make more sense to use the secondary bar size here. The size of the secondary bar still not necessarily valid end-to-end, because in b2b the peer's primary bar size could be even smaller. I'm not entirely convinced that this should represent the end-to-end size of local and peer memory window configurations. I think it should represent the largest side that would be valid to pass to ntb_mw_set_trans(). Then, the peers should communicate their respective max sizes (along with translation addresses, etc) before setting up the translations, and that exchange will ensure that the size finally used is valid end-to-end. > > Thoughts? > > Logan
Re: New NTB API Issue
On 6/22/2017 12:32 PM, Allen Hubbe wrote: From: Logan Gunthorpe Hey Guys, I've run into some subtle issues with the new API: It has to do with splitting mw_get_range into mw_get_align and peer_mw_get_addr. The original mw_get_range returned the size of the /local/ memory window's size, address and alignment requirements. The ntb clients then take the local size and transmit it via spads to the peer which would use it in setting up the memory window. However, it made the assumption that the alignment restrictions were symmetric on both hosts seeing they were not sent across the link. The new API makes a sensible change for this in that mw_get_align appears to be intended to return the alignment restrictions (and now size) of the peer. This helps a bit for the Switchtec driver but appears to be a semantic change that wasn't really reflected in the changes to the other NTB code. So, I see a couple of issues: 1) With our hardware, we can't actually know anything about the peer's memory windows until the peer has finished its setup (ie. the link is up). However, all the clients call the function during probe, before the link is ready. There's really no good reason for this, so I think we should change the clients so that mw_get_align is called only when the link is up. 2) The changes to the Intel and AMD driver for mw_get_align sets *max_size to the local pci resource size. (Thus making the assumption that the local is the same as the peer, which is wrong). max_size isn't actually used for anything so it's not _really_ an issue, but I do think it's confusing and incorrect. I'd suggest we remove max_size until something actually needs it, or at least set it to zero in cases where the hardware doesn't support returning the size of the peer's memory window (ie. in the Intel and AMD drivers). You're right, and the b2b_split in the Intel driver even makes use of different primary/secondary bar sizes. For Intel and AMD, it would make more sense to use the secondary bar size here. The size of the secondary bar still not necessarily valid end-to-end, because in b2b the peer's primary bar size could be even smaller. I'm not entirely convinced that this should represent the end-to-end size of local and peer memory window configurations. I think it should represent the largest side that would be valid to pass to ntb_mw_set_trans(). Then, the peers should communicate their respective max sizes (along with translation addresses, etc) before setting up the translations, and that exchange will ensure that the size finally used is valid end-to-end. But why would the client ever need to use the max_size instead of the actual size of the bar as retrieved and exchanged from peer_mw_get_addr? Logan
Re: New NTB API Issue
On 6/22/2017 12:32 PM, Allen Hubbe wrote: From: Logan Gunthorpe Hey Guys, I've run into some subtle issues with the new API: It has to do with splitting mw_get_range into mw_get_align and peer_mw_get_addr. The original mw_get_range returned the size of the /local/ memory window's size, address and alignment requirements. The ntb clients then take the local size and transmit it via spads to the peer which would use it in setting up the memory window. However, it made the assumption that the alignment restrictions were symmetric on both hosts seeing they were not sent across the link. The new API makes a sensible change for this in that mw_get_align appears to be intended to return the alignment restrictions (and now size) of the peer. This helps a bit for the Switchtec driver but appears to be a semantic change that wasn't really reflected in the changes to the other NTB code. So, I see a couple of issues: 1) With our hardware, we can't actually know anything about the peer's memory windows until the peer has finished its setup (ie. the link is up). However, all the clients call the function during probe, before the link is ready. There's really no good reason for this, so I think we should change the clients so that mw_get_align is called only when the link is up. 2) The changes to the Intel and AMD driver for mw_get_align sets *max_size to the local pci resource size. (Thus making the assumption that the local is the same as the peer, which is wrong). max_size isn't actually used for anything so it's not _really_ an issue, but I do think it's confusing and incorrect. I'd suggest we remove max_size until something actually needs it, or at least set it to zero in cases where the hardware doesn't support returning the size of the peer's memory window (ie. in the Intel and AMD drivers). You're right, and the b2b_split in the Intel driver even makes use of different primary/secondary bar sizes. For Intel and AMD, it would make more sense to use the secondary bar size here. The size of the secondary bar still not necessarily valid end-to-end, because in b2b the peer's primary bar size could be even smaller. I'm not entirely convinced that this should represent the end-to-end size of local and peer memory window configurations. I think it should represent the largest side that would be valid to pass to ntb_mw_set_trans(). Then, the peers should communicate their respective max sizes (along with translation addresses, etc) before setting up the translations, and that exchange will ensure that the size finally used is valid end-to-end. But why would the client ever need to use the max_size instead of the actual size of the bar as retrieved and exchanged from peer_mw_get_addr? Logan
New NTB API Issue
Hey Guys, I've run into some subtle issues with the new API: It has to do with splitting mw_get_range into mw_get_align and peer_mw_get_addr. The original mw_get_range returned the size of the /local/ memory window's size, address and alignment requirements. The ntb clients then take the local size and transmit it via spads to the peer which would use it in setting up the memory window. However, it made the assumption that the alignment restrictions were symmetric on both hosts seeing they were not sent across the link. The new API makes a sensible change for this in that mw_get_align appears to be intended to return the alignment restrictions (and now size) of the peer. This helps a bit for the Switchtec driver but appears to be a semantic change that wasn't really reflected in the changes to the other NTB code. So, I see a couple of issues: 1) With our hardware, we can't actually know anything about the peer's memory windows until the peer has finished its setup (ie. the link is up). However, all the clients call the function during probe, before the link is ready. There's really no good reason for this, so I think we should change the clients so that mw_get_align is called only when the link is up. 2) The changes to the Intel and AMD driver for mw_get_align sets *max_size to the local pci resource size. (Thus making the assumption that the local is the same as the peer, which is wrong). max_size isn't actually used for anything so it's not _really_ an issue, but I do think it's confusing and incorrect. I'd suggest we remove max_size until something actually needs it, or at least set it to zero in cases where the hardware doesn't support returning the size of the peer's memory window (ie. in the Intel and AMD drivers). Thoughts? Logan
New NTB API Issue
Hey Guys, I've run into some subtle issues with the new API: It has to do with splitting mw_get_range into mw_get_align and peer_mw_get_addr. The original mw_get_range returned the size of the /local/ memory window's size, address and alignment requirements. The ntb clients then take the local size and transmit it via spads to the peer which would use it in setting up the memory window. However, it made the assumption that the alignment restrictions were symmetric on both hosts seeing they were not sent across the link. The new API makes a sensible change for this in that mw_get_align appears to be intended to return the alignment restrictions (and now size) of the peer. This helps a bit for the Switchtec driver but appears to be a semantic change that wasn't really reflected in the changes to the other NTB code. So, I see a couple of issues: 1) With our hardware, we can't actually know anything about the peer's memory windows until the peer has finished its setup (ie. the link is up). However, all the clients call the function during probe, before the link is ready. There's really no good reason for this, so I think we should change the clients so that mw_get_align is called only when the link is up. 2) The changes to the Intel and AMD driver for mw_get_align sets *max_size to the local pci resource size. (Thus making the assumption that the local is the same as the peer, which is wrong). max_size isn't actually used for anything so it's not _really_ an issue, but I do think it's confusing and incorrect. I'd suggest we remove max_size until something actually needs it, or at least set it to zero in cases where the hardware doesn't support returning the size of the peer's memory window (ie. in the Intel and AMD drivers). Thoughts? Logan