Re: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library

Ananyev, Konstantin Wed, 17 May 2017 09:10:41 -0700

> > Hi Ferruh,
> > Please see my comments/questions below.
> > Thanks
> > Konstantin
> >
> >> +
> >> +/**
> >> + * @file
> >> + *
> >> + * RTE Flow Classify Library
> >> + *
> >> + * This library provides flow record information with some measured 
> >> properties.
> >> + *
> >> + * Application can select variety of flow types based on various flow 
> >> keys.
> >> + *
> >> + * Library only maintains flow records between 
> >> rte_flow_classify_stats_get()
> >> + * calls and with a maximum limit.
> >> + *
> >> + * Provided flow record will be linked list rte_flow_classify_stat_xxx
> >> + * structure.
> >> + *
> >> + * Library is responsible from allocating and freeing memory for flow 
> >> record
> >> + * table. Previous table freed with next rte_flow_classify_stats_get() 
> >> call and
> >> + * all tables are freed with rte_flow_classify_type_reset() or
> >> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the 
> >> fly while
> >> + * creating records.
> >> + *
> >> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx
> >> + * callbacks and start filling flow record table.
> >> + * With rte_flow_classify_stats_get(), pointer sent to caller and 
> >> meanwhile
> >> + * library continues collecting records.
> >> + *
> >> + *  Usage:
> >> + *  - application calls rte_flow_classify_type_set() for a device
> >> + *  - library creates Rx/Tx callbacks for packets and start filling flow 
> >> table
> >
> > Does it necessary to use an  RX callback here?
> > Can library provide an API like collect(port_id, input_mbuf[], pkt_num) 
> > instead?
> > So the user would have a choice either setup a callback or call collect() 
> > directly.
> 
> This was also comment from Morten, I will update RFC to use direct API call.
> 
> >
> >> + *    for that type of flow (currently only one flow type supported)
> >> + *  - application calls rte_flow_classify_stats_get() to get pointer to 
> >> linked
> >> + *    listed flow table. Library assigns this pointer to another value 
> >> and keeps
> >> + *    collecting flow data. In next rte_flow_classify_stats_get(), 
> >> library first
> >> + *    free the previous table, and pass current table to the application, 
> >> keep
> >> + *    collecting data.
> >
> > Ok, but that means that you can't use stats_get() for the same type
> > from 2 different threads without explicit synchronization?
> 
> Correct.
> And multiple threads shouldn't be calling this API. It doesn't store
> previous flow data, so multiple threads calling this only can have piece
> of information. Do you see any use case that multiple threads can call
> this API?


One example would be when you have multiple queues per port,
managed/monitored by different cores.
BTW, how are you going to collect the stats in that way?

> 
> >
> >> + *  - application calls rte_flow_classify_type_reset(), library 
> >> unregisters the
> >> + *    callbacks and free all flow table data.
> >> + *
> >> + */
> >> +
> >> +enum rte_flow_classify_type {
> >> +  RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
> >> +  RTE_FLOW_CLASSIFY_TYPE_MAX,
> >> +};
> >> +
> >> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) 
> >> << 1) - 1)
> >> +
> >> +/**
> >> + * Global configuration struct
> >> + */
> >> +struct rte_flow_classify_config {
> >> +  uint32_t type; /* bitwise enum rte_flow_classify_type values */
> >> +  void *flow_table_prev;
> >> +  uint32_t flow_table_prev_item_count;
> >> +  void *flow_table_current;
> >> +  uint32_t flow_table_current_item_count;
> >> +} rte_flow_classify_config[RTE_MAX_ETHPORTS];
> >> +
> >> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX
> >> +
> >> +/**
> >> + * Classification stats data struct
> >> + */
> >> +struct rte_flow_classify_stat_generic {
> >> +  struct rte_flow_classify_stat_generic *next;
> >> +  uint32_t id;
> >> +  uint64_t timestamp;
> >> +
> >> +  struct ether_addr src_mac;
> >> +  struct ether_addr dst_mac;
> >> +  uint32_t src_ipv4;
> >> +  uint32_t dst_ipv4;
> >> +  uint8_t l3_protocol_id;
> >> +  uint16_t src_port;
> >> +  uint16_t dst_port;
> >> +
> >> +  uint64_t packet_count;
> >> +  uint64_t packet_size; /* bytes */
> >> +};
> >
> > Ok, so if I understood things right, for generic type it will always 
> > classify all incoming packets by:
> > <src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port>
> > all by absolute values, and represent results as a linked list.
> > Is that correct, or I misunderstood your intentions here?
> 
> Correct.
> 
> > If so, then I see several disadvantages here:
> > 1) It is really hard to predict what kind of stats is required for that 
> > particular cases.
> >  Let say some people would like to collect stat by <dst_mac,, vlan> ,
> > another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on.
> > Having just one hardcoded filter doesn't seem very felxable/usable.
> > I think you need to find a way to allow user to define what type of filter 
> > they want to apply.
> 
> The flow type should be provided by applications, according their needs,
> and needs to be implemented in this library. The generic one will be the
> only one implemented in first version:
> enum rte_flow_classify_type {
>       RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
>       RTE_FLOW_CLASSIFY_TYPE_MAX,
> };
> 
> 
> App should set the type first via the API:
> rte_flow_classify_type_set(uint8_t port_id, uint32_t type);
> 
> 
> And the stats for this type will be returned, because returned type can
> be different type of struct, returned as void:
> rte_flow_classify_stats_get(uint8_t port_id, void *stats);

I understand that, but it means that for every different filter user wants to 
use,
someone has to update the library: define a new type and write a new piece of 
code to handle it.
That seems not flexible and totally un-extendable from user perspective.
Even  HW allows some flexibility with RX filters.
Why not allow user to specify a classification filter  he/she wants for that 
particular case?
In a way both rte_flow and rte_acl work?

> 
> > I think it was discussed already, but I still wonder why rte_flow_item 
> > can't be used for that approach?
> 
> 
> > 2) Even  one 10G port can produce you ~14M rte_flow_classify_stat_generic 
> > entries in one second
> > (all packets have different ipv4/ports or so).
> > Accessing/retrieving items over linked list with 14M entries - doesn't 
> > sound like a good idea.
> > I'd say we need some better way to retrieve/present collected data.
> 
> This is to keep flows, so I expect the numbers will be less comparing to
> the packet numbers.

That was an  extreme example to show how bad the selected approach should 
behave.
What I am trying to say: we need a way to collect and retrieve stats in a quick 
and easy way.
Let say right now user invoked stats_get(port=0, type=generic).
Now, he is interested to get stats for particular dst_ip only.
The only way to get it: walk over whole list stats_get() returned and examine 
each entry one by one.

I think would be much better to have something like:

struct rte_flow_stats {timestamp; packet_count; packet_bytes; ..};

<fill rte_flow_item (or something else) to define desired filter>

filter_id = rte_flow_stats_register(.., &rte_flow_item);
....
struct rte_flow_stats stats;
rte_flow_stats_get(..., filter_id, &stats);

That allows user to define flows to collect stats for.
Again in that case you don't need to worry about when/where to destroy the 
previous
version of your stats.
Of course the open question is how to treat packets that would match more than 
one flow
(priority/insertion order/something else?), but I suppose we'll need to deal 
with that question anyway.
 
Konstantin

> It is possible to use fixed size arrays for this. But I think it is easy
> to make this switch later, I would like to see the performance effect
> before doing this switch. Do you think is it OK to start like this and
> give that decision during implementation?

Re: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library

Reply via email to