On 21 May 2015 at 17:45, Maxim Uvarov <[email protected]> wrote:
> From the rfc 3549 netlink looks like good protocol to communicate between > data plane and control plane. And messages are defined by that protocol > also. At least we should do something the same. > Netlink seems limited to the specific functionality already present in the Linux kernel. An ODP IPC/message passing mechanism must be extensible and support user-defined messages. There's no reason for ODP MBUS to impose any message format. Any (set of) applications can model their message formats on Netlink. I don't understand how Netlink can be used to communicate between (any two) two applications. Please enlighten me. -- Ola > > > Maxim. > > On 21 May 2015 at 17:46, Ola Liljedahl <[email protected]> wrote: > >> On 21 May 2015 at 15:56, Alexandru Badicioiu < >> [email protected]> wrote: >> >>> I got the impression that ODP MBUS API would define a transport >>> protocol/API between an ODP >>> >> No the MBUS API is just an API for message passing (think of the OSE IPC >> API) and doesn't specify use cases or content. Just like the ODP packet API >> doesn't specify what the content in a packet means or the format of the >> content. >> >> >>> application and a control plane application, like TCP is the transport >>> protocol for HTTP applications (e.g Web). Netlink defines exactly that - >>> transport protocol for configuration messages. >>> Maxim asked about the messages - should applications define the message >>> format and/or the message content? Wouldn't be an easier task for the >>> application to define only the content and let ODP to define a format? >>> >> How can you define a format when you don't know what the messages are >> used for and what data needs to be transferred? Why should the MBUS API or >> implementations care about the message format? It's just payload and none >> of their business. >> >> If you want to, you can specify formats for specific purposes, e.g. reuse >> Netlink formats for the functions that Netlink supports. Some ODP >> applications may use this, other not (because they use some other protocol >> or they implement some other functionality). >> >> >> >>> Reliability could be an issue but Netlink spec says how applications can >>> create reliable protocols: >>> >>> >>> One could create a reliable protocol between an FEC and a CPC by >>> using the combination of sequence numbers, ACKs, and retransmit >>> timers. Both sequence numbers and ACKs are provided by Netlink; >>> timers are provided by Linux. >>> >>> And you could do the same in ODP but I prefer not to, this adds a level >> of complexity to the application code I do not want. Perhaps the actual >> MBUS implementation has to do this but then hidden from the applications. >> Just like TCP reliability and ordering etc is hidden from the applications >> that just do read and write. >> >> One could create a heartbeat protocol between the FEC and CPC by >>> using the ECHO flags and the NLMSG_NOOP message. >>> >>> >>> >>> >>> >>> >>> >>> On 21 May 2015 at 16:23, Ola Liljedahl <[email protected]> wrote: >>> >>>> On 21 May 2015 at 15:05, Alexandru Badicioiu < >>>> [email protected]> wrote: >>>> >>>>> I was referring to the Netlink protocol in itself, as a model for ODP >>>>> MBUS (or IPC). >>>>> >>>> Isn't the Netlink protocol what the endpoints send between them? This >>>> is not specified by the ODP IPC/MBUS API, applications can define or re-use >>>> whatever protocol they like. The protocol definition is heavily dependent >>>> on what you actually use the IPC for and we shouldn't force ODP users to >>>> use some specific predefined protocol. >>>> >>>> Also the "wire protocol" is left undefined, this is up to the >>>> implementation to define and each platform can have its own definition. >>>> >>>> And netlink isn't even reliable. I know that that creates problems, >>>> e.g. impossible to get a clean and complete snapshot of e.g. the routing >>>> table. >>>> >>>> >>>>> The interaction between the FEC and the CPC, in the Netlink context, >>>>> defines a protocol. Netlink provides mechanisms for the CPC >>>>> (residing in user space) and the FEC (residing in kernel space) to >>>>> have their own protocol definition -- *kernel space and user space >>>>> just mean different protection domains*. Therefore, a wire protocol >>>>> is needed to communicate. The wire protocol is normally provided by >>>>> some privileged service that is able to copy between multiple >>>>> protection domains. We will refer to this service as the Netlink >>>>> service. The Netlink service can also be encapsulated in a different >>>>> transport layer, if the CPC executes on a different node than the >>>>> FEC. The FEC and CPC, using Netlink mechanisms, may choose to define >>>>> a reliable protocol between each other. By default, however, Netlink >>>>> provides an unreliable communication. >>>>> >>>>> Note that the FEC and CPC can both live in the same memory protection >>>>> domain and use the connect() system call to create a path to the peer >>>>> and talk to each other. We will not discuss this mechanism further >>>>> other than to say that it is available. Throughout this document, we >>>>> will refer interchangeably to the FEC to mean kernel space and the >>>>> CPC to mean user space. This denomination is not meant, however, to >>>>> restrict the two components to these protection domains or to the >>>>> same compute node. >>>>> >>>>> >>>>> >>>>> On 21 May 2015 at 15:55, Ola Liljedahl <[email protected]> >>>>> wrote: >>>>> >>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu < >>>>>> [email protected]> wrote: >>>>>> > Hi, >>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit >>>>>> the purpose >>>>>> > of ODP IPC (within a single OS instance)? >>>>>> I interpret this as a question whether Netlink would be fit as an >>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so >>>>>> contended and imbued with different meanings). >>>>>> >>>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel >>>>>> and kernel-to-user while the ODP IPC-MBUS is focused on user-to-user >>>>>> (application-to-application). >>>>>> >>>>>> I see a couple of primary requirements: >>>>>> >>>>>> - Support communication (message exchange) between user space >>>>>> processes. >>>>>> - Support arbitrary used-defined messages. >>>>>> - Ordered, reliable delivery of messages. >>>>>> >>>>>> >>>>>> From the little I can quickly read up on Netlink, the first two >>>>>> requirements do not seem supported. But perhaps someone with more >>>>>> intimate >>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended >>>>>> to support u2u and user-defined messages, the current specialization >>>>>> (e.g. >>>>>> specialized addressing, specialized message formats) seems contrary to >>>>>> the >>>>>> goals of providing generic mechanisms in the kernel that can be used for >>>>>> different things. >>>>>> >>>>>> My IPC/MBUS reference implementation for linux-generic builds upon >>>>>> POSIX message queues. One of my issues is that I want the message queue >>>>>> associated with a process to go away when the process goes away. The >>>>>> message queues are not independent entities. >>>>>> >>>>>> -- Ola >>>>>> >>>>>> > >>>>>> > Thanks, >>>>>> > Alex >>>>>> > >>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <[email protected]> >>>>>> wrote: >>>>>> >> >>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo) >>>>>> >> <[email protected]> wrote: >>>>>> >> > >>>>>> >> > >>>>>> >> >> -----Original Message----- >>>>>> >> >> From: lng-odp [mailto:[email protected]] On >>>>>> Behalf Of >>>>>> >> >> ext >>>>>> >> >> Ola Liljedahl >>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM >>>>>> >> >> To: [email protected] >>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h >>>>>> >> >> >>>>>> >> >> As promised, here is my first attempt at a standalone API for >>>>>> IPC - >>>>>> >> >> inter >>>>>> >> >> process communication in a shared nothing architecture (message >>>>>> passing >>>>>> >> >> between processes which do not share memory). >>>>>> >> >> >>>>>> >> >> Currently all definitions are in the file ipc.h but it is >>>>>> possible to >>>>>> >> >> break out some message/event related definitions (everything >>>>>> from >>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic >>>>>> the >>>>>> >> >> packet_io.h/packet.h separation. >>>>>> >> >> >>>>>> >> >> The semantics of message passing is that sending a message to an >>>>>> >> >> endpoint >>>>>> >> >> will always look like it succeeds. The appearance of endpoints >>>>>> is >>>>>> >> >> explicitly >>>>>> >> >> notified through user-defined messages specified in the >>>>>> >> >> odp_ipc_resolve() >>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost >>>>>> >> >> connection) >>>>>> >> >> is also explicitly notified through user-defined messages >>>>>> specified in >>>>>> >> >> the >>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because the >>>>>> >> >> addressed >>>>>> >> >> endpoints has disappeared. >>>>>> >> >> >>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in >>>>>> order. If >>>>>> >> >> message >>>>>> >> >> N sent to an endpoint is delivered, then all messages <N have >>>>>> also been >>>>>> >> >> delivered. Message delivery does not guarantee actual >>>>>> processing by the >>>>>> >> > >>>>>> >> > Ordered is OK requirement, but "all messages <N have also been >>>>>> >> > delivered" means in practice loss less delivery (== re-tries and >>>>>> >> > retransmission windows, etc). Lossy vs loss less link should be >>>>>> an >>>>>> >> > configuration option. >>>>>> >> I am just targeting internal communication which I expect to be >>>>>> >> reliable. There is not any physical "link" involved. If an >>>>>> >> implementation chooses to use some unreliable media, then it will >>>>>> need >>>>>> >> to take some counter measures. Any loss of message could be >>>>>> detected >>>>>> >> using sequence numbers (and timeouts) and handled by (temporary) >>>>>> >> disconnection (so that no more messages will be delivered should >>>>>> one >>>>>> >> go missing). >>>>>> >> >>>>>> >> I am OK with adding the lossless/lossy configuration to the API as >>>>>> >> long as lossless option is always implemented. Is this a >>>>>> configuration >>>>>> >> when creating the local IPC endpoint or when sending a message to >>>>>> >> another endpoint? >>>>>> >> >>>>>> >> > >>>>>> >> > Also what "delivered" means?' >>>>>> >> > >>>>>> >> > Message: >>>>>> >> > - transmitted successfully over the link ? >>>>>> >> > - is now under control of the remote node (post office) ? >>>>>> >> > - delivered into application input queue ? >>>>>> >> Probably this one but I am not sure the exact definition matters, >>>>>> "has >>>>>> >> been delivered" or "will eventually be delivered unless connection >>>>>> to >>>>>> >> the destination is lost". Maybe there is a better word than >>>>>> >> "delivered? >>>>>> >> >>>>>> >> "Made available into the destination (recipient) address space"? >>>>>> >> >>>>>> >> > - has been dequeued from application queue ? >>>>>> >> > >>>>>> >> > >>>>>> >> >> recipient. End-to-end acknowledgements (using messages) should >>>>>> be used >>>>>> >> >> if >>>>>> >> >> this guarantee is important to the user. >>>>>> >> >> >>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal >>>>>> reliable >>>>>> >> >> multidrop network where each endpoint has a unique address >>>>>> which is >>>>>> >> >> only >>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is >>>>>> >> >> destroyed >>>>>> >> >> and then recreated (with the same name), the new endpoint will >>>>>> have a >>>>>> >> >> new address (eventually endpoints addresses will have to be >>>>>> recycled >>>>>> >> >> but >>>>>> >> >> not for a very long time). Endpoints names do not necessarily >>>>>> have to >>>>>> >> >> be >>>>>> >> >> unique. >>>>>> >> > >>>>>> >> > How widely these addresses are unique: inside one VM, multiple >>>>>> VMs under >>>>>> >> > the same host, multiple devices on a LAN (VLAN), ... >>>>>> >> Currently, the scope of the name and address space is defined by >>>>>> the >>>>>> >> implementation. Perhaps we should define it? My current interest is >>>>>> >> within an OS instance (bare metal or virtualised). Between >>>>>> different >>>>>> >> OS instances, I expect something based on IP to be used (because >>>>>> you >>>>>> >> don't know where those different OS/VM instances will be deployed >>>>>> so >>>>>> >> you need topology-independent addressing). >>>>>> >> >>>>>> >> Based on other feedback, I have dropped the contented usage of >>>>>> "IPC" >>>>>> >> and now call it "message bus" (MBUS). >>>>>> >> >>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal >>>>>> >> reliable multidrop network"... >>>>>> >> >>>>>> >> > >>>>>> >> > >>>>>> >> >> >>>>>> >> >> Signed-off-by: Ola Liljedahl <[email protected]> >>>>>> >> >> --- >>>>>> >> >> (This document/code contribution attached is provided under the >>>>>> terms >>>>>> >> >> of >>>>>> >> >> agreement LES-LTM-21309) >>>>>> >> >> >>>>>> >> > >>>>>> >> > >>>>>> >> >> +/** >>>>>> >> >> + * Create IPC endpoint >>>>>> >> >> + * >>>>>> >> >> + * @param name Name of local IPC endpoint >>>>>> >> >> + * @param pool Pool for incoming messages >>>>>> >> >> + * >>>>>> >> >> + * @return IPC handle on success >>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set >>>>>> >> >> + */ >>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool); >>>>>> >> > >>>>>> >> > This creates (implicitly) the local end point address. >>>>>> >> > >>>>>> >> > >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Set the default input queue for an IPC endpoint >>>>>> >> >> + * >>>>>> >> >> + * @param ipc IPC handle >>>>>> >> >> + * @param queue Queue handle >>>>>> >> >> + * >>>>>> >> >> + * @retval 0 on success >>>>>> >> >> + * @retval <0 on failure >>>>>> >> >> + */ >>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue); >>>>>> >> > >>>>>> >> > Multiple input queues are likely needed for different priority >>>>>> messages. >>>>>> >> > >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Resolve endpoint by name >>>>>> >> >> + * >>>>>> >> >> + * Look up an existing or future endpoint by name. >>>>>> >> >> + * When the endpoint exists, return the specified message with >>>>>> the >>>>>> >> >> endpoint >>>>>> >> >> + * as the sender. >>>>>> >> >> + * >>>>>> >> >> + * @param ipc IPC handle >>>>>> >> >> + * @param name Name to resolve >>>>>> >> >> + * @param msg Message to return >>>>>> >> >> + */ >>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc, >>>>>> >> >> + const char *name, >>>>>> >> >> + odp_ipc_msg_t msg); >>>>>> >> > >>>>>> >> > How widely these names are visible? Inside one VM, multiple VMs >>>>>> under >>>>>> >> > the same host, multiple devices on a LAN (VLAN), ... >>>>>> >> > >>>>>> >> > I think name service (or address resolution) are better handled >>>>>> in >>>>>> >> > middleware layer. If ODP provides unique addresses and message >>>>>> passing >>>>>> >> > mechanism, additional services can be built on top. >>>>>> >> > >>>>>> >> > >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Monitor endpoint >>>>>> >> >> + * >>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint. >>>>>> >> >> + * When the endpoint is dead, return the specified message >>>>>> with the >>>>>> >> >> endpoint >>>>>> >> >> + * as the sender. >>>>>> >> >> + * >>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as >>>>>> dead >>>>>> >> >> endpoints. >>>>>> >> >> + * >>>>>> >> >> + * @param ipc IPC handle >>>>>> >> >> + * @param addr Address of monitored endpoint >>>>>> >> >> + * @param msg Message to return >>>>>> >> >> + */ >>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc, >>>>>> >> >> + const uint8_t addr[ODP_IPC_ADDR_SIZE], >>>>>> >> >> + odp_ipc_msg_t msg); >>>>>> >> > >>>>>> >> > Again, I'd see node health monitoring and alarms as middleware >>>>>> services. >>>>>> >> > >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Send message >>>>>> >> >> + * >>>>>> >> >> + * Send a message to an endpoint (which may already be dead). >>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted) >>>>>> messages >>>>>> >> >> will >>>>>> >> >> be >>>>>> >> >> + * delivered up to the point of endpoint death or lost >>>>>> connection. >>>>>> >> >> + * Actual reception and processing is not guaranteed (use >>>>>> end-to-end >>>>>> >> >> + * acknowledgements for that). >>>>>> >> >> + * Monitor the remote endpoint to detect death or lost >>>>>> connection. >>>>>> >> >> + * >>>>>> >> >> + * @param ipc IPC handle >>>>>> >> >> + * @param msg Message to send >>>>>> >> >> + * @param addr Address of remote endpoint >>>>>> >> >> + * >>>>>> >> >> + * @retval 0 on success >>>>>> >> >> + * @retval <0 on error >>>>>> >> >> + */ >>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc, >>>>>> >> >> + odp_ipc_msg_t msg, >>>>>> >> >> + const uint8_t addr[ODP_IPC_ADDR_SIZE]); >>>>>> >> > >>>>>> >> > This would be used to send a message to an address, but normal >>>>>> >> > odp_queue_enq() could be used to circulate this event inside an >>>>>> application >>>>>> >> > (ODP instance). >>>>>> >> > >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Get address of sender (source) of message >>>>>> >> >> + * >>>>>> >> >> + * @param msg Message handle >>>>>> >> >> + * @param addr Address of sender endpoint >>>>>> >> >> + */ >>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg, >>>>>> >> >> + uint8_t addr[ODP_IPC_ADDR_SIZE]); >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Message data pointer >>>>>> >> >> + * >>>>>> >> >> + * Return a pointer to the message data >>>>>> >> >> + * >>>>>> >> >> + * @param msg Message handle >>>>>> >> >> + * >>>>>> >> >> + * @return Pointer to the message data >>>>>> >> >> + */ >>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg); >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Message data length >>>>>> >> >> + * >>>>>> >> >> + * Return length of the message data. >>>>>> >> >> + * >>>>>> >> >> + * @param msg Message handle >>>>>> >> >> + * >>>>>> >> >> + * @return Message length >>>>>> >> >> + */ >>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg); >>>>>> >> >> + >>>>>> >> >> +/** >>>>>> >> >> + * Set message length >>>>>> >> >> + * >>>>>> >> >> + * Set length of the message data. >>>>>> >> >> + * >>>>>> >> >> + * @param msg Message handle >>>>>> >> >> + * @param len New length >>>>>> >> >> + * >>>>>> >> >> + * @retval 0 on success >>>>>> >> >> + * @retval <0 on error >>>>>> >> >> + */ >>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len); >>>>>> >> > >>>>>> >> > When data ptr or data len is modified: push/pull head, push/pull >>>>>> tail >>>>>> >> > would be analogies from packet API >>>>>> >> > >>>>>> >> > >>>>>> >> > -Petri >>>>>> >> > >>>>>> >> > >>>>>> >> _______________________________________________ >>>>>> >> lng-odp mailing list >>>>>> >> [email protected] >>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp >>>>>> > >>>>>> > >>>>>> >>>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> lng-odp mailing list >> [email protected] >> https://lists.linaro.org/mailman/listinfo/lng-odp >> >> >
_______________________________________________ lng-odp mailing list [email protected] https://lists.linaro.org/mailman/listinfo/lng-odp
