On Thu, Mar 19, 2020 at 4:33 PM Jon Maloy <[email protected]> wrote:
>
>
>
> On 3/19/20 7:03 PM, Tom Herbert wrote:
> > On Thu, Mar 19, 2020 at 3:43 PM Jon Maloy <[email protected]> wrote:
> >>
> >>
> >> On 3/18/20 12:04 AM, Joseph Touch wrote:
> >>
> >> Hi all,
> >>
> >> I’m quite confused by this request.
> >>
> >> It seems like they either have an implementation issue (in Linux).
> >>
> >> Linux "passthru" GSO is implemented so that any IP based protocol which 
> >> wants to benefit
> >> from it needs its own IP protocol number. Doing this generically through 
> >> the already existing
> >> UDP protocol number is not possible, because GSO on a host must be 
> >> implemented
> >> specifically (e.g., regarding segmentation) per carried protocol. That is 
> >> just a fact, and not
> >> an implementation issue.
> > Jon,
> >
> > I'm not sure I understand your point. Linux already supports GSO, and
> > GRO for that matter, for several protocols encapsulated over UDP. I
> > don't see any requirement for a protocol to need its own IP protocol
> > number in this regard.
> >
> > Tom
> Yes, but this is not about guest GSO. What we need is something more
> similar to TCP TSO, where we can send full-size buffers down to the
> host OS, and only do segmentation (or in our case, a TIPC specific
> fragmentation where each fragment gets an individually numbered header)
> when we find that the destination is off-host.
> Basically we want to transport full-size messages between VMs when those
> are located in the same host. So far, I haven´t found any way to
> do this on the host by looking at the inner protocol carried over UDP.
> But I may of course be wrong at this point, I know you are the expert.
>

Jon,

You might want to look at Willem's work in UDP GSO
(http://vger.kernel.org/lpc_net2018_talks/willemdebruijn-lpc2018-udpgso-presentation-20181104.pdf).
That might be useful as a generic method assuming the proper APIs are
supported (this is exactly how QUIC GSO was solved without needing
explict kernel support for QUIC).

Tom

> ///jon
>
> >
> >>
> >> I checked their documentation, which includes smoothing that looks a 
> >> little like an Internet Draft:
> >> http://tipc.io/protocol.html
> >> but it’s quite confusing. Taken at face value, they make their own 
> >> argument that IP addresses won’t work - at which point running raw over IP 
> >> serves no utility (sec 3.1.1),
> >>
> >> That is not a correct interpretation of the text. There is nowhere stated 
> >> that IP addresses won't work for TIPC,
> >> neither in sec. 3.1.1 or anywhere else. Of course they work, *for 
> >> transport purposes*, just like they have been
> >> doing for many years already when running TIPC over UDP. What we state 
> >> elsewhere in the document is that
> >> IP addresses are no good in the *user API*, because they are location 
> >> bound.
> >> That is also why DNS was invented, I  believe.
> >>
> >> We also state that using IP addresses is less optimal than omitting the IP 
> >> layer altogether
> >> and using MAC addresses, but that doesn't mean the former are useless, -it 
> >> just makes
> >> IP the only viable alternative in the cases when a network owner doesn't 
> >> allow non-IP
> >> protocols though their back planes, or when routing gets involved.
> >>
> >> even though most of those claims are debatable (DNS-SD is too static? And 
> >> expensive?? How so?). Then they reinvent the DNS in Section 6.
> >>
> >> There is no doubt that DNS is not the best choice for the type of 
> >> environments (tight clusters) where
> >> we use TIPC. All DNS implementations I know run in user land, and doing a 
> >> service discovery typically
> >> means at least one, and often several inter-process and potentially 
> >> inter-node hops. Even if there is
> >> a process local lookup cache in each sender, that cache has to be 
> >> populated before it is of any use.
> >> Instead, TIPC uses a tailor-made kernel resident translation service which 
> >> normally contains a complete
> >> copy of the the lookup database, so there are no unnecessary hops and no 
> >> cache misses.
> >>
> >> This would have been of less importance if TIPC were only a connection 
> >> oriented TCP-like service where
> >> service lookup is only needed at connection setup. But a just as important 
> >> feature of TIPC is its reliable
> >> connectionless transport mode. Here, the lookup service is not primarily 
> >> about service discovery
> >> (although that is also important), but about efficient on-the-fly 
> >> translation between user level service
> >> addresses (aka "port names") and location bound socket addresses (aka 
> >> "port identities"). This
> >> translation has to be performed per message, not per connection, since the 
> >> destination may change
> >> between each message.
> >>
> >> If we were to make an analogy with the IP world, we could imagine that we 
> >> use UDP to send high
> >> volume traffic to many different destinations, each having its own domain 
> >> name. Making a
> >> separate DNS lookup for each sent message would certainly work, but it 
> >> would not by far be as
> >> performant as having a tailor made "always cache resident" translation 
> >> table, shared between
> >> all processes, like we do in TIPC.
> >>
> >> Furthermore, when the connectionless service is used, sockets might be 
> >> created/deleted and
> >> bound/unbound at extremely high rates, much higher than DNS with its 
> >> hierarchical updates
> >> is meant to deal with. This is what we mean with DNS being too "static". 
> >> It is not saying that
> >> DNS is bad, it is just stating that it is not designed for the very high 
> >> performance requirements
> >> and dynamism we have in TIPC.
> >>
> >> There is no doubt that a few things in TIPC could have been done 
> >> differently,  but the decision
> >> to design our own topology/lookup service is not among those. This request 
> >> is an attempt to
> >> open up for moving beyond some current limitations, e.g., by enabling 
> >> introduction of a more
> >> versatile 128-bit  service addressing concept.  Along with this request we 
> >> are aiming at having
> >> an updated version of the protocol description adopted as an informational 
> >> RFC, so that
> >> TIPC can be regarded as an IETF supported protocol in its own right.
> >>
> >> Whatever the viewpoints, TIPC is currently what it is, and rather than 
> >> focusing on the motivation
> >> for certain implementation choices and how they work, I think IETF should 
> >> consider the fact
> >> that this is a well-established service used by dozens of small and big 
> >> companies, running high-volume
> >> traffic at hundreds of telco sites around the globe. They should also 
> >> consider that TIPC has
> >> existed as a stable and well-maintained implementation in all major Linux 
> >> distros for many years.
> >>
> >> IETF now has a genuine chance to help us making TIPC even more useful for 
> >> existing and new users.
> >>
> >> BR
> >> Jon Maloy
> >>
> >>
> >> Frankly, IMO this would probably have a difficult time arguing for a 
> >> transport protocol port number, much less an IP protocol number.
> >>
> >> Joe
> >>
> >>
> >> On Mar 17, 2020, at 3:34 PM, Suresh Krishnan <[email protected]> wrote:
> >>
> >> Hi all,
> >>    IANA received an IP protocol number allocation request from Jon Maloy 
> >> <[email protected]> for the Transparent Inter Process Communication (TIPC) 
> >> protocol. I picked up this request as Internet AD as the registration 
> >> procedure requires IESG Approval. I had provided the information below to 
> >> the IESG and discussed this with a favorable view of this request. I am 
> >> recommending allocation of an IP protocol number for this. If you have any 
> >> concerns that you think I might have overlooked, please let me know by end 
> >> of day March 24 2020.
> >>
> >> After several round trips of back and forth probing I had collected the 
> >> following information regarding the protocol number request for TIPC. 
> >> There were two main questions I had for him:
> >>
> >> * Q1: Why did they want an IP protocol number?
> >> * Q2: Is the protocol implemented and deployed widely?
> >>
> >> Q1: Why did they want an IP protocol number?
> >> ====================================
> >>
> >> There are two main reasons why they want to reserve an IP protocol number:
> >>
> >> 1)  Performance
> >> They are currently working on adding GSO support to TIPC, including a 
> >> TSO-like "full-size buffer pass-thru" though virtio and the host OS tap 
> >> interface. They have experimentally implemented GSO across UDP tunnels, 
> >> but performance is not good because of the way the tunnel GSO is 
> >> implemented, and there is no 'pass-thru' support for this in Linux. They 
> >> have even done the same at the pure L2 level, but L2 transport is 
> >> sometimes not accepted by the cloud maintainers or the telco operators, 
> >> and hence they need an alternative. The best alternative, both from a 
> >> performance and acceptability viewpoint would be to establish TIPC as a 
> >> full-fledged IP protocol, apart from the traditional L2 bearer many users 
> >> are still using.
> >>
> >> 2) Currently TIPC has two user address types:
> >>
> >> struct tipc_service_addr{
> >>      uint32_t type;
> >>      uint32_t instance;
> >>      uint32_t node;
> >> };
> >> struct tipc_service_addr{
> >>      uint32_t port;
> >>      uint32_t node;
> >> };
> >>
> >> They want to complement this  with a new API where we have a unified 
> >> address type:
> >> struct tipc_addr{
> >>     u8 type[16];
> >>     u8 instance[16];
> >>     u8 node[16];
> >> };
> >>
> >> This would give a 128-bit value range for both 'type', 'instance' and 
> >> 'node', and opens up for new opportunities:
> >> - Users will never need to coordinate 'type' values since there will no 
> >> risk of collisions.
> >> - Users can put whatever they want into the fields, e.g., an IPv6 address, 
> >> a Kubernetes or Docker container id, a LUKS disk UUID or just a plain 
> >> string.
> >> For the 'node' id this has already been implemented and released, but it 
> >> is not reflected in the API yet.
> >>
> >> For the API extension they need a new IPPROTO_TIPC socket type which can 
> >> be registered and instantiated independently from the traditional AF_TIPC 
> >> socket type.
> >>
> >> You can find more info about this at http://tipc.io
> >>
> >> Q2: Is the protocol implemented and deployed widely?
> >> ==========================================
> >>
> >> The requester provided the following information when I asked about who 
> >> was currently using TIPC (pretty much about adoption and deployment):
> >>
> >> I can give you a list of current or recently active code contributors and 
> >> companies/people who have been asking for support:
> >>
> >> Huawei:
> >> For natural reasons I don't know any details about them, I can only name 
> >> persons I have seen contributing to netdev or being active on our mailing 
> >> lists. Huawei people sometimes use gmail addresses when posting questions 
> >> and patches, so there are more persons than I have listed here.
> >> Dmitry Kolmakov <[email protected]>
> >> Ji Qin <[email protected]>
> >> Wei Yongjun <[email protected]>
> >> <[email protected]>
> >> Yue Haibing <[email protected]>
> >> Junwei Hu <[email protected]>
> >> Jie Liu <[email protected]>
> >> Qiang Ning <[email protected]>
> >> Zhiqiang Liu <[email protected]>
> >> Miaohe Lin <[email protected]>
> >> Wang Wang <[email protected]>
> >> Kang Zhou <[email protected]>
> >> Suanming Mou <[email protected]>
> >>
> >> Hu Junwei is the one I see most active at the moment.
> >>
> >> Nokia:
> >> Tommi Rantala <[email protected]>
> >>
> >> Verizon:
> >> Amar Nv <[email protected]>
> >> Jayaraj Wilson, <[email protected]>
> >>
> >> Hewlett Packard Enterprise:
> >> <[email protected]>
> >>
> >> WindRiver:
> >> Ying Xue <[email protected]>
> >> He is my co-maintainer at netdev ans sourcefoge.
> >> Windriver has several products in the field based on TIPC, e.g. control 
> >> system for Sikorsky helicopters.
> >>
> >> Orange:
> >> Christophe JAILLET <[email protected]>
> >>
> >> Redhat:
> >> The person contacting me to have TIPC integrated and maintained in 
> >> RHEL-8.0 was
> >> Sirius Rayner-Karlsson <[email protected]>
> >> He motivated it with a request from "a telco vendor", but I don't know 
> >> which one.
> >> Hence, TIPC is now integrated in and officially supported from RHEL 8.1
> >>
> >> ABB:
> >> https://new.abb.com/pl
> >> Mikolaj K. Chojnacki <[email protected]>
> >> Krzysztof Rybak <[email protected]>
> >>
> >> Ericsson:
> >> All (dozens of) applications based on the TSP and Core 
> >> Middleware/Components Based Architecture (CMW/CBA) platforms is per 
> >> definition based on TIPC. They have not yet started to use TIPC on their 
> >> Kubernetes based ADP platform, but there is work ongoing on this.
> >>
> >> I also see numerous other people being active, from small (I believe) 
> >> companies, universities and private contributors. E.g.,
> >> Innovsys Inc  http://www.innovsys.com/innovsys/
> >> Allied Telesis https://www.alliedtelesis.com/
> >> Telaverge Communications http://www.telaverge.com/
> >> Ivan Serdyuk <[email protected]> (seems to be responsible for 
> >> the ZeroMQ port of TIPC)
> >> John Hopkins University / Fast LTA, Munich <[email protected]>
> >> Just to mention a few...
> >>
> >> TIPC is currently maintained jointly by Ericsson, WindRiver, Redhat, and 
> >> the Australian consulting company DEK Technologies 
> >> https://www.dektech.com.au/
> >>
> >> Thanks
> >> Suresh
> >>
> >> _______________________________________________
> >> Int-area mailing list
> >> [email protected]
> >> https://www.ietf.org/mailman/listinfo/int-area
> >>
> >>
> >>
> >> _______________________________________________
> >> Int-area mailing list
> >> [email protected]
> >> https://www.ietf.org/mailman/listinfo/int-area
> >>
> >>
> >> _______________________________________________
> >> Int-area mailing list
> >> [email protected]
> >> https://www.ietf.org/mailman/listinfo/int-area
>

_______________________________________________
Int-area mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/int-area

Reply via email to