SolidWallOfCode commented on pull request #8235: URL: https://github.com/apache/trafficserver/pull/8235#issuecomment-907891665
Interestingly, I just put explicit regular expression substitution suport in to TxnBox last week. Some observations from that work - Rather than passing around something that is onle a vector of regular expression capture groups, you should evolve toward passing around a transaction context object. I suspect there are likely to be more things in the future you will want generally available to evaluations. TxnBox uses the class "Context" (see [here](https://github.com/SolidWallOfCode/txn_box/blob/master/plugin/include/txn_box/Context.h)). Inside that class is support for the active regular expression captures. Starting at around [line 513](https://github.com/SolidWallOfCode/txn_box/blob/master/plugin/include/txn_box/Context.h#L513) is that support. The context contains the match results (in `_rxp_active` - there is also `_rxp_working` which allows attempts to be made at regular expression matches without disturbing the current results - on success, the two are swapped). In addition a string view of the original string for the match is stored (`_rxp_src`). The key point is when a regular expression match is tried , if it is successful a view of the original string and match data are cached in the context. The code to retrieve them is [here](https://github.com/SolidWallOfCode/txn_box/blob/master/plugin/src/Context.cc#L508). `ArgPack` is a helper class that is invoked to handle numeric extractors in a string (e.g. such as your "$3"). The key point is the capture group is extracted from the original string as string view using `std::string_view::substr` on the source string with offsets from the match data. The overall result is there is no copying nor allocation needed to apply the regular expression and extract the capture groups. No `vector` nor `std::string` is initialized. Because TxnBox supports an arbitrary number of capture groups, its logic is a bit more complex, but you could put `ovector[OVECOUNT] _rxp_active` as a member of the context. If you're concerned about the lifetime of the original string, you could copy that (once) into a `std::string` in the context. Because `header_rewrite` doesn't have recursive config structures you have no concerns about state preservation across nested comparisons which means you probably don't need to deal with working vs. active match data. Rather than the `ArgPack` and `bwprint` you could have a method like `std::string_view group(int idx)` which returns a view of the capture group. Overall it might look something like ``` class Context { protected: int ovector[OVECOUNT] _rxp_active; std::string_view _rxp_src; public: std::string_view rxp_group(int idx) { idx *- 2; return _rxp_src.substr(_rxp_active[idx], _rxp_active[idx+1] - _rxp-active[idx]); } // Plus appropriate constructors, etc. }; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
