Hi Xiang, Sorry for delay in response(Was busy with 19.11 proposal deadline). Please see inline. > > Reply to Xiang's queries in main thread: > > Hi all, > > Some questions regarding APIs. Could you please give more insights? > > 1) rte_regex_ops > a) rsp_flags > These two flags RTE_REGEX_OPS_RSP_PMI_SOJ_F and > RTE_REGEX_OPS_RSP_PMI_EOJ_F are used for cross buffer scan. > RTE_REGEX_OPS_RSP_PMI_EOJ_F tells whether we have a partial match > at the end of current buffer after scan. > What's the purpose of having RTE_REGEX_OPS_RSP_PMI_SOJ_F? > > [Jerin] Since we need three states to represent partial match buffer, > RTE_REGEX_OPS_RSP_PMI_SOJ_F to > represent start of the buffer, intermediate buffers with no flag, and end of > the buffer with RTE_REGEX_OPS_RSP_PMI_EOJ
> [Xiang] How could a user leverage these flags for matching? Suppose a large > buffer is divided into multiple chunks. Will RTE_REGEX_OPS_RSP_PMI_SOJ_F > cause an early quit once it isn't set after scan the first chunk. Similarly, > RTE_REGEX_OPS_RSP_PMI_EOJ tells a user whether to stop matching future > buffers after finish the last chunk? Let me describe with an example, Assume, 1) struct rte_regex_dev_info:: max_payload_size set to 1024 2) rte_regex_dev_config:: dev_cfg_flags configured with RTE_REGEX_DEV_CFG_CROSS_BUFFER_SCAN_F 3) Device programmed with matching "hello\s+world" pattern 4) user enqueue struct rte_regex_ops:: buf_addr point following "data" and struct rte_regex_op:: scan_size = 1024 data[0..1021] = data don’t have hello world pattern data[1022] = 'h' data[1023] = 'e' 5) user enqueue struct rte_regex_ops:: buf_addr point following "data" and struct rte_regex_op:: scan_size = 9 data[0] = 'l' data[1] = 'l' data[2] = 'o' data[3] = ' ' data[4] = 'w' data[5] = 'o' data[6] = 'r' data[7] = 'l' data[8] = 'd' If so, Response to 4) will be RTE_REGEX_OPS_RSP_PMI_SOJ_F in rte_regex_ops:: rsp_flags on dequeue Where rte_regex_match:: offset is 1022 and len 2 Response to 5) will be RTE_REGEX_OPS_RSP_PMI_EOJ_F in rte_regex_ops:: rsp_flags on dequeue Where rte_regex_match:: offset is 0 and len 9 > > RTE_REGEX_OPS_RSP_MAX_PREFIX_F: This looks like a definition for a > specific hardware implementation. I am wondering what this PREFIX refers > to:)? > > [Jerin] Yes. Looks like it is for hardware specific implementation. Introduced > rte_regex_dev_attr_set/get functions to make it portable and > To add new implementation specific fields. > For example, if a rule is > /ABCDEF.*XYZ/, ABCD is considered the prefix, and EF.*XYZ is considered the > factor. The prefix is a literal > string, while the factor can contain complex regular expression constructs. As > a result, rule matching occurs in > two stages: prefix matching and factor matching. > > b) user_id or user_ptr > Under what kind of circumstances should an application pass value into > these variables for enqueue and dequeuer operations? > > [Jerin] Just like rte_crypto_ops, struct rte_regex_ops also allocated using > mempool normally, on enqueue, user can specify user_id > If needed to in order identify the op on dequeue if required. The use case > could be to store the sequence number from application > POV or storing the mbuf ptr in which pattern is requested etc. > > > 2) rte_regex_match > a) offset; /**< Starting Byte Position for matched rule. */ and > uint16_t > len; /**< Length of match in bytes */ > Looks like the matching offset is defined as *starting matching offset* > instead of *end matching offset*, e.g. report the offset of "a" instead of "c" > for pattern "abc". > If so, this makes it hard to integrate software regex libraries such as > Hyperscan and RE2 as they only report *end matching offset* without length > of match. > Although Hyperscan has API for *starting matching offset*, it only > delivers > partial syntax support. So I think we have to define *end of matching offset* > for software solutions. > > [Jerin] I understand the hyperscan's HS_FLAG_SOM_LEFTMOST tradeoffs. I > thought application would need always the length of the match. > Probably we will see how other HW implementation (from Mellanox) etc. We > will try to abstract it, probably we can make it as function of "user > requested". > [Xiang] Yes, it will be good to make it per user request. At least from > Hyperscan user's point of view, start of match and match length are not > mandatory. OK. I think, we can introduce RTE_REGEX_DEV_CFG_MATCH_AS_START In device configure. Since offset+len == end, we can introduce following generic inline function. static inline rte_regex_match_end(truct rte_regex_match *match) { match->offset + match->len; } Example: pattern to match is "hello\s+world" and data is following data[4] = 'h' data[5] = 'e' data[6] = 'l' data[7] = 'l' data[8] = 'o' data[9] = ' ' data[10] = 'w' data[11] = 'o' data[12] = 'r' data[13] = 'l' data[14] = 'd' if device is configured with RTE_REGEX_DEV_CFG_MATCH_AS_START match->offset returns 4 match->len returns 11 if device is NOT configured with RTE_REGEX_DEV_CFG_MATCH_AS_START driver MAY return the following(in hyperscan case) match->offset returns 0 match->len returns 11 + 4 In both case(irrespective of flags, to make application life easy) rte_regex_match_end() would return 15. If application demands for MATCH_AS_START then driver can return match->offset returns 4 and match->len returns 11 Aka set HS_FLAG_SOM_LEFTMOST in hyperscan driver, But application should use rte_regex_match_end() for finding the end of the match. To make, work in all cases. Is it OK? > > 3) rte_regex_rule_db_update() > Does this mean we can dynamically add or delete rules for an already > generated database without recompile from scratch for hardware Regex > implementation? > If so, this isn't possible for software solutions as they don't support > dynamic database update and require recompile. > > [Jerin] rte_regex_rule_db_update() internally it would call recompile > function for both HW and SW. > See rte_regex_dev_config::rule_db in rte_regex_dev_configure() for > precompiled rule database case. > [Xiang] OK, sounds like we have to save the original rule-set for the device > in > order to do recompile. I see both ADD and REMOVE operators from > rte_regex_rule. > For rules with REMOVE operator, what's the expected behavior to handle > them for the old rule-set? Do we need to go through the old rule-set and > remove corresponding rules before doing recompile? Yes.