On 5/17/2017 5:10 PM, Ananyev, Konstantin wrote: >>> Hi Ferruh, >>> Please see my comments/questions below. >>> Thanks >>> Konstantin >>> >>>> + >>>> +/** >>>> + * @file >>>> + * >>>> + * RTE Flow Classify Library >>>> + * >>>> + * This library provides flow record information with some measured >>>> properties. >>>> + * >>>> + * Application can select variety of flow types based on various flow >>>> keys. >>>> + * >>>> + * Library only maintains flow records between >>>> rte_flow_classify_stats_get() >>>> + * calls and with a maximum limit. >>>> + * >>>> + * Provided flow record will be linked list rte_flow_classify_stat_xxx >>>> + * structure. >>>> + * >>>> + * Library is responsible from allocating and freeing memory for flow >>>> record >>>> + * table. Previous table freed with next rte_flow_classify_stats_get() >>>> call and >>>> + * all tables are freed with rte_flow_classify_type_reset() or >>>> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the >>>> fly while >>>> + * creating records. >>>> + * >>>> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx >>>> + * callbacks and start filling flow record table. >>>> + * With rte_flow_classify_stats_get(), pointer sent to caller and >>>> meanwhile >>>> + * library continues collecting records. >>>> + * >>>> + * Usage: >>>> + * - application calls rte_flow_classify_type_set() for a device >>>> + * - library creates Rx/Tx callbacks for packets and start filling flow >>>> table >>> >>> Does it necessary to use an RX callback here? >>> Can library provide an API like collect(port_id, input_mbuf[], pkt_num) >>> instead? >>> So the user would have a choice either setup a callback or call collect() >>> directly. >> >> This was also comment from Morten, I will update RFC to use direct API call. >> >>> >>>> + * for that type of flow (currently only one flow type supported) >>>> + * - application calls rte_flow_classify_stats_get() to get pointer to >>>> linked >>>> + * listed flow table. Library assigns this pointer to another value >>>> and keeps >>>> + * collecting flow data. In next rte_flow_classify_stats_get(), >>>> library first >>>> + * free the previous table, and pass current table to the application, >>>> keep >>>> + * collecting data. >>> >>> Ok, but that means that you can't use stats_get() for the same type >>> from 2 different threads without explicit synchronization? >> >> Correct. >> And multiple threads shouldn't be calling this API. It doesn't store >> previous flow data, so multiple threads calling this only can have piece >> of information. Do you see any use case that multiple threads can call >> this API? > > One example would be when you have multiple queues per port, > managed/monitored by different cores. > BTW, how are you going to collect the stats in that way? > >> >>> >>>> + * - application calls rte_flow_classify_type_reset(), library >>>> unregisters the >>>> + * callbacks and free all flow table data. >>>> + * >>>> + */ >>>> + >>>> +enum rte_flow_classify_type { >>>> + RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0), >>>> + RTE_FLOW_CLASSIFY_TYPE_MAX, >>>> +}; >>>> + >>>> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) >>>> << 1) - 1) >>>> + >>>> +/** >>>> + * Global configuration struct >>>> + */ >>>> +struct rte_flow_classify_config { >>>> + uint32_t type; /* bitwise enum rte_flow_classify_type values */ >>>> + void *flow_table_prev; >>>> + uint32_t flow_table_prev_item_count; >>>> + void *flow_table_current; >>>> + uint32_t flow_table_current_item_count; >>>> +} rte_flow_classify_config[RTE_MAX_ETHPORTS]; >>>> + >>>> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX >>>> + >>>> +/** >>>> + * Classification stats data struct >>>> + */ >>>> +struct rte_flow_classify_stat_generic { >>>> + struct rte_flow_classify_stat_generic *next; >>>> + uint32_t id; >>>> + uint64_t timestamp; >>>> + >>>> + struct ether_addr src_mac; >>>> + struct ether_addr dst_mac; >>>> + uint32_t src_ipv4; >>>> + uint32_t dst_ipv4; >>>> + uint8_t l3_protocol_id; >>>> + uint16_t src_port; >>>> + uint16_t dst_port; >>>> + >>>> + uint64_t packet_count; >>>> + uint64_t packet_size; /* bytes */ >>>> +}; >>> >>> Ok, so if I understood things right, for generic type it will always >>> classify all incoming packets by: >>> <src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port> >>> all by absolute values, and represent results as a linked list. >>> Is that correct, or I misunderstood your intentions here? >> >> Correct. >> >>> If so, then I see several disadvantages here: >>> 1) It is really hard to predict what kind of stats is required for that >>> particular cases. >>> Let say some people would like to collect stat by <dst_mac,, vlan> , >>> another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on. >>> Having just one hardcoded filter doesn't seem very felxable/usable. >>> I think you need to find a way to allow user to define what type of filter >>> they want to apply. >> >> The flow type should be provided by applications, according their needs, >> and needs to be implemented in this library. The generic one will be the >> only one implemented in first version: >> enum rte_flow_classify_type { >> RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0), >> RTE_FLOW_CLASSIFY_TYPE_MAX, >> }; >> >> >> App should set the type first via the API: >> rte_flow_classify_type_set(uint8_t port_id, uint32_t type); >> >> >> And the stats for this type will be returned, because returned type can >> be different type of struct, returned as void: >> rte_flow_classify_stats_get(uint8_t port_id, void *stats); > > I understand that, but it means that for every different filter user wants to > use, > someone has to update the library: define a new type and write a new piece of > code to handle it. > That seems not flexible and totally un-extendable from user perspective. > Even HW allows some flexibility with RX filters. > Why not allow user to specify a classification filter he/she wants for that > particular case? > In a way both rte_flow and rte_acl work? > >> >>> I think it was discussed already, but I still wonder why rte_flow_item >>> can't be used for that approach? >> >> >>> 2) Even one 10G port can produce you ~14M rte_flow_classify_stat_generic >>> entries in one second >>> (all packets have different ipv4/ports or so). >>> Accessing/retrieving items over linked list with 14M entries - doesn't >>> sound like a good idea. >>> I'd say we need some better way to retrieve/present collected data. >> >> This is to keep flows, so I expect the numbers will be less comparing to >> the packet numbers. > > That was an extreme example to show how bad the selected approach should > behave. > What I am trying to say: we need a way to collect and retrieve stats in a > quick and easy way. > Let say right now user invoked stats_get(port=0, type=generic). > Now, he is interested to get stats for particular dst_ip only. > The only way to get it: walk over whole list stats_get() returned and examine > each entry one by one. > > I think would be much better to have something like: > > struct rte_flow_stats {timestamp; packet_count; packet_bytes; ..}; > > <fill rte_flow_item (or something else) to define desired filter> > > filter_id = rte_flow_stats_register(.., &rte_flow_item); > .... > struct rte_flow_stats stats; > rte_flow_stats_get(..., filter_id, &stats); > > That allows user to define flows to collect stats for. > Again in that case you don't need to worry about when/where to destroy the > previous > version of your stats.
Except from using rte_flow, above suggest instead of: - set key/filter - poll collect() - when ever app wants call stats_get() using: - poll stats_get(key/filter); specially after switched from callbacks to polling, this makes sense because application already will have to do to continuous calls to this library. Merging set filter/collect/stats_get into same function saves library from storing/deleting stats until app asks for them, as you mentioned above. So, I will update RFC according. > Of course the open question is how to treat packets that would match more > than one flow > (priority/insertion order/something else?), but I suppose we'll need to deal > with that question anyway. > > Konstantin > >> It is possible to use fixed size arrays for this. But I think it is easy >> to make this switch later, I would like to see the performance effect >> before doing this switch. Do you think is it OK to start like this and >> give that decision during implementation?