Re: [lng-odp] [RFC] Add ipc.h

Ola Liljedahl Thu, 21 May 2015 14:11:49 -0700

On 21 May 2015 at 17:45, Maxim Uvarov <[email protected]> wrote:


> From the rfc 3549 netlink looks like good protocol to communicate between
> data plane and control plane. And messages are defined by that protocol
> also. At least we should do something the same.
>
Netlink seems limited to the specific functionality already present in the
Linux kernel. An ODP IPC/message passing mechanism must be extensible and
support user-defined messages. There's no reason for ODP MBUS to impose any
message format.

Any (set of) applications can model their message formats on Netlink.

I don't understand how Netlink can be used to communicate between (any two)
two applications. Please enlighten me.

-- Ola


>
>
> Maxim.
>
> On 21 May 2015 at 17:46, Ola Liljedahl <[email protected]> wrote:
>
>> On 21 May 2015 at 15:56, Alexandru Badicioiu <
>> [email protected]> wrote:
>>
>>> I got the impression that ODP MBUS API would define a transport
>>> protocol/API between an ODP
>>>
>> No the MBUS API is just an API for message passing (think of the OSE IPC
>> API) and doesn't specify use cases or content. Just like the ODP packet API
>> doesn't specify what the content in a packet means or the format of the
>> content.
>>
>>
>>> application and a control plane application, like TCP is the transport
>>> protocol for HTTP applications (e.g Web). Netlink defines exactly that -
>>> transport protocol for configuration messages.
>>> Maxim asked about the messages - should applications define the message
>>> format and/or the message content? Wouldn't be an easier task for the
>>> application to define only the content and let ODP to define a format?
>>>
>> How can you define a format when you don't know what the messages are
>> used for and what data needs to be transferred? Why should the MBUS API or
>> implementations care about the message format? It's just payload and none
>> of their business.
>>
>> If you want to, you can specify formats for specific purposes, e.g. reuse
>> Netlink formats for the functions that Netlink supports. Some ODP
>> applications may use this, other not (because they use some other protocol
>> or they implement some other functionality).
>>
>>
>>
>>> Reliability could be an issue but Netlink spec says how applications can
>>> create reliable protocols:
>>>
>>>
>>> One could create a reliable protocol between an FEC and a CPC by
>>>    using the combination of sequence numbers, ACKs, and retransmit
>>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>>    timers are provided by Linux.
>>>
>>> And you could do the same in ODP but I prefer not to, this adds a level
>> of complexity to the application code I do not want. Perhaps the actual
>> MBUS implementation has to do this but then hidden from the applications.
>> Just like TCP reliability and ordering etc is hidden from the applications
>> that just do read and write.
>>
>>    One could create a heartbeat protocol between the FEC and CPC by
>>>    using the ECHO flags and the NLMSG_NOOP message.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 21 May 2015 at 16:23, Ola Liljedahl <[email protected]> wrote:
>>>
>>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>>> [email protected]> wrote:
>>>>
>>>>> I was referring to the  Netlink protocol in itself, as a model for ODP
>>>>> MBUS (or IPC).
>>>>>
>>>> Isn't the Netlink protocol what the endpoints send between them? This
>>>> is not specified by the ODP IPC/MBUS API, applications can define or re-use
>>>> whatever protocol they like. The protocol definition is heavily dependent
>>>> on what you actually use the IPC for and we shouldn't force ODP users to
>>>> use some specific predefined protocol.
>>>>
>>>> Also the "wire protocol" is left undefined, this is up to the
>>>> implementation to define and each platform can have its own definition.
>>>>
>>>> And netlink isn't even reliable. I know that that creates problems,
>>>> e.g. impossible to get a clean and complete snapshot of e.g. the routing
>>>> table.
>>>>
>>>>
>>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>>    have their own protocol definition -- *kernel space and user space
>>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>>    some privileged service that is able to copy between multiple
>>>>>    protection domains.  We will refer to this service as the Netlink
>>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>>    transport layer, if the CPC executes on a different node than the
>>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>>    provides an unreliable communication.
>>>>>
>>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>>    domain and use the connect() system call to create a path to the peer
>>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>>    other than to say that it is available. Throughout this document, we
>>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>>    restrict the two components to these protection domains or to the
>>>>>    same compute node.
>>>>>
>>>>>
>>>>>
>>>>> On 21 May 2015 at 15:55, Ola Liljedahl <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>>> [email protected]> wrote:
>>>>>> > Hi,
>>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit
>>>>>> the purpose
>>>>>> > of ODP IPC (within a single OS instance)?
>>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>>> contended and imbued with different meanings).
>>>>>>
>>>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel
>>>>>> and kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>>>>> (application-to-application).
>>>>>>
>>>>>> I see a couple of primary requirements:
>>>>>>
>>>>>>    - Support communication (message exchange) between user space
>>>>>>    processes.
>>>>>>    - Support arbitrary used-defined messages.
>>>>>>    - Ordered, reliable delivery of messages.
>>>>>>
>>>>>>
>>>>>> From the little I can quickly read up on Netlink, the first two
>>>>>> requirements do not seem supported. But perhaps someone with more 
>>>>>> intimate
>>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>>> to support u2u and user-defined messages, the current specialization 
>>>>>> (e.g.
>>>>>> specialized addressing, specialized message formats) seems contrary to 
>>>>>> the
>>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>>> different things.
>>>>>>
>>>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>>>> POSIX message queues. One of my issues is that I want the message queue
>>>>>> associated with a process to go away when the process goes away. The
>>>>>> message queues are not independent entities.
>>>>>>
>>>>>> -- Ola
>>>>>>
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Alex
>>>>>> >
>>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <[email protected]>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>>> >> <[email protected]> wrote:
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> -----Original Message-----
>>>>>> >> >> From: lng-odp [mailto:[email protected]] On
>>>>>> Behalf Of
>>>>>> >> >> ext
>>>>>> >> >> Ola Liljedahl
>>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>>> >> >> To: [email protected]
>>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>>> >> >>
>>>>>> >> >> As promised, here is my first attempt at a standalone API for
>>>>>> IPC -
>>>>>> >> >> inter
>>>>>> >> >> process communication in a shared nothing architecture (message
>>>>>> passing
>>>>>> >> >> between processes which do not share memory).
>>>>>> >> >>
>>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>>> possible to
>>>>>> >> >> break out some message/event related definitions (everything
>>>>>> from
>>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic
>>>>>> the
>>>>>> >> >> packet_io.h/packet.h separation.
>>>>>> >> >>
>>>>>> >> >> The semantics of message passing is that sending a message to an
>>>>>> >> >> endpoint
>>>>>> >> >> will always look like it succeeds. The appearance of endpoints
>>>>>> is
>>>>>> >> >> explicitly
>>>>>> >> >> notified through user-defined messages specified in the
>>>>>> >> >> odp_ipc_resolve()
>>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>>>> >> >> connection)
>>>>>> >> >> is also explicitly notified through user-defined messages
>>>>>> specified in
>>>>>> >> >> the
>>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>>>>>> >> >> addressed
>>>>>> >> >> endpoints has disappeared.
>>>>>> >> >>
>>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in
>>>>>> order. If
>>>>>> >> >> message
>>>>>> >> >> N sent to an endpoint is delivered, then all messages <N have
>>>>>> also been
>>>>>> >> >> delivered. Message delivery does not guarantee actual
>>>>>> processing by the
>>>>>> >> >
>>>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>>>> >> > delivered" means in practice loss less delivery (== re-tries and
>>>>>> >> > retransmission windows, etc). Lossy vs loss less link should be
>>>>>> an
>>>>>> >> > configuration option.
>>>>>> >> I am just targeting internal communication which I expect to be
>>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>>> >> implementation chooses to use some unreliable media, then it will
>>>>>> need
>>>>>> >> to take some counter measures. Any loss of message could be
>>>>>> detected
>>>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>>>> >> disconnection (so that no more messages will be delivered should
>>>>>> one
>>>>>> >> go missing).
>>>>>> >>
>>>>>> >> I am OK with adding the lossless/lossy configuration to the API as
>>>>>> >> long as lossless option is always implemented. Is this a
>>>>>> configuration
>>>>>> >> when creating the local  IPC endpoint or when sending a message to
>>>>>> >> another endpoint?
>>>>>> >>
>>>>>> >> >
>>>>>> >> > Also what "delivered" means?'
>>>>>> >> >
>>>>>> >> > Message:
>>>>>> >> >  - transmitted successfully over the link ?
>>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>>> >> >  - delivered into application input queue ?
>>>>>> >> Probably this one but I am not sure the exact definition matters,
>>>>>> "has
>>>>>> >> been delivered" or "will eventually be delivered unless connection
>>>>>> to
>>>>>> >> the destination is lost". Maybe there is a better word than
>>>>>> >> "delivered?
>>>>>> >>
>>>>>> >> "Made available into the destination (recipient) address space"?
>>>>>> >>
>>>>>> >> >  - has been dequeued from application queue ?
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> recipient. End-to-end acknowledgements (using messages) should
>>>>>> be used
>>>>>> >> >> if
>>>>>> >> >> this guarantee is important to the user.
>>>>>> >> >>
>>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>>>>> reliable
>>>>>> >> >> multidrop network where each endpoint has a unique address
>>>>>> which is
>>>>>> >> >> only
>>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>>>>> >> >> destroyed
>>>>>> >> >> and then recreated (with the same name), the new endpoint will
>>>>>> have a
>>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>>> recycled
>>>>>> >> >> but
>>>>>> >> >> not for a very long time). Endpoints names do not necessarily
>>>>>> have to
>>>>>> >> >> be
>>>>>> >> >> unique.
>>>>>> >> >
>>>>>> >> > How widely these addresses are unique: inside one VM, multiple
>>>>>> VMs under
>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>> >> Currently, the scope of the name and address space is defined by
>>>>>> the
>>>>>> >> implementation. Perhaps we should define it? My current interest is
>>>>>> >> within an OS instance (bare metal or virtualised). Between
>>>>>> different
>>>>>> >> OS instances, I expect something based on IP to be used (because
>>>>>> you
>>>>>> >> don't know where those different OS/VM instances will be deployed
>>>>>> so
>>>>>> >> you need topology-independent addressing).
>>>>>> >>
>>>>>> >> Based on other feedback, I have dropped the contented usage of
>>>>>> "IPC"
>>>>>> >> and now call it "message bus" (MBUS).
>>>>>> >>
>>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>>>>>> >> reliable multidrop network"...
>>>>>> >>
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >>
>>>>>> >> >> Signed-off-by: Ola Liljedahl <[email protected]>
>>>>>> >> >> ---
>>>>>> >> >> (This document/code contribution attached is provided under the
>>>>>> terms
>>>>>> >> >> of
>>>>>> >> >> agreement LES-LTM-21309)
>>>>>> >> >>
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> +/**
>>>>>> >> >> + * Create IPC endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>>> >> >> + * @param pool Pool for incoming messages
>>>>>> >> >> + *
>>>>>> >> >> + * @return IPC handle on success
>>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>>> >> >> + */
>>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>>> >> >
>>>>>> >> > This creates (implicitly) the local end point address.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc   IPC handle
>>>>>> >> >> + * @param queue Queue handle
>>>>>> >> >> + *
>>>>>> >> >> + * @retval  0 on success
>>>>>> >> >> + * @retval <0 on failure
>>>>>> >> >> + */
>>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>>> >> >
>>>>>> >> > Multiple input queues are likely needed for different priority
>>>>>> messages.
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Resolve endpoint by name
>>>>>> >> >> + *
>>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>>> >> >> + * When the endpoint exists, return the specified message with
>>>>>> the
>>>>>> >> >> endpoint
>>>>>> >> >> + * as the sender.
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc IPC handle
>>>>>> >> >> + * @param name Name to resolve
>>>>>> >> >> + * @param msg Message to return
>>>>>> >> >> + */
>>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>>> >> >> +                  const char *name,
>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>> >> >
>>>>>> >> > How widely these names are visible? Inside one VM, multiple VMs
>>>>>> under
>>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>>> >> >
>>>>>> >> > I think name service (or address resolution) are better handled
>>>>>> in
>>>>>> >> > middleware layer. If ODP provides unique addresses and message
>>>>>> passing
>>>>>> >> > mechanism, additional services can be built on top.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Monitor endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>>> >> >> + * When the endpoint is dead, return the specified message
>>>>>> with the
>>>>>> >> >> endpoint
>>>>>> >> >> + * as the sender.
>>>>>> >> >> + *
>>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as
>>>>>> dead
>>>>>> >> >> endpoints.
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc IPC handle
>>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>>> >> >> + * @param msg Message to return
>>>>>> >> >> + */
>>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>>> >> >
>>>>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>>>>> services.
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Send message
>>>>>> >> >> + *
>>>>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>>>> messages
>>>>>> >> >> will
>>>>>> >> >> be
>>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>>> connection.
>>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>>> end-to-end
>>>>>> >> >> + * acknowledgements for that).
>>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>>> connection.
>>>>>> >> >> + *
>>>>>> >> >> + * @param ipc IPC handle
>>>>>> >> >> + * @param msg Message to send
>>>>>> >> >> + * @param addr Address of remote endpoint
>>>>>> >> >> + *
>>>>>> >> >> + * @retval 0 on success
>>>>>> >> >> + * @retval <0 on error
>>>>>> >> >> + */
>>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>> >> >
>>>>>> >> > This would be used to send a message to an address, but normal
>>>>>> >> > odp_queue_enq() could be used to circulate this event inside an
>>>>>> application
>>>>>> >> > (ODP instance).
>>>>>> >> >
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Get address of sender (source) of message
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + * @param addr Address of sender endpoint
>>>>>> >> >> + */
>>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Message data pointer
>>>>>> >> >> + *
>>>>>> >> >> + * Return a pointer to the message data
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + *
>>>>>> >> >> + * @return Pointer to the message data
>>>>>> >> >> + */
>>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Message data length
>>>>>> >> >> + *
>>>>>> >> >> + * Return length of the message data.
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + *
>>>>>> >> >> + * @return Message length
>>>>>> >> >> + */
>>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>>> >> >> +
>>>>>> >> >> +/**
>>>>>> >> >> + * Set message length
>>>>>> >> >> + *
>>>>>> >> >> + * Set length of the message data.
>>>>>> >> >> + *
>>>>>> >> >> + * @param msg Message handle
>>>>>> >> >> + * @param len New length
>>>>>> >> >> + *
>>>>>> >> >> + * @retval 0 on success
>>>>>> >> >> + * @retval <0 on error
>>>>>> >> >> + */
>>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>>> >> >
>>>>>> >> > When data ptr or data len is modified: push/pull head, push/pull
>>>>>> tail
>>>>>> >> > would be analogies from packet API
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > -Petri
>>>>>> >> >
>>>>>> >> >
>>>>>> >> _______________________________________________
>>>>>> >> lng-odp mailing list
>>>>>> >> [email protected]
>>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> lng-odp mailing list
>> [email protected]
>> https://lists.linaro.org/mailman/listinfo/lng-odp
>>
>>
>

_______________________________________________
lng-odp mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [RFC] Add ipc.h

Reply via email to