SolidWallOfCode commented on pull request #8235:
URL: https://github.com/apache/trafficserver/pull/8235#issuecomment-907891665


   Interestingly, I just put explicit regular expression substitution suport in 
to TxnBox last week. Some observations from that work -
   
   Rather than passing around something that is onle a vector of regular 
expression capture groups, you should evolve toward passing around a 
transaction context object. I suspect there are likely to be more things in the 
future you will want generally available to evaluations. TxnBox uses the class 
"Context" (see 
[here](https://github.com/SolidWallOfCode/txn_box/blob/master/plugin/include/txn_box/Context.h)).
 Inside that class is support for the active regular expression captures. 
Starting at around [line 
513](https://github.com/SolidWallOfCode/txn_box/blob/master/plugin/include/txn_box/Context.h#L513)
 is that support. The context contains the match results (in `_rxp_active` - 
there is also `_rxp_working` which allows attempts to be made at regular 
expression matches without disturbing the current results - on success, the two 
are swapped). In addition a string view of the original string for the match is 
stored (`_rxp_src`). The key point is when a regular expression match is tried
 , if it is successful a view of the original string and match data are cached 
in the context.
   
   The code to retrieve them is 
[here](https://github.com/SolidWallOfCode/txn_box/blob/master/plugin/src/Context.cc#L508).
 `ArgPack` is a helper class that is invoked to handle numeric extractors in a 
string (e.g. such as your "$3"). The key point is the capture group is 
extracted from the original string as string view using 
`std::string_view::substr` on the source string with offsets from the match 
data.
   
   The overall result is there is no copying nor allocation needed to apply the 
regular expression and extract the capture groups. No `vector` nor 
`std::string` is initialized. Because TxnBox supports an arbitrary number of 
capture groups, its logic is a bit more complex, but you could put 
`ovector[OVECOUNT] _rxp_active` as a member of the context. If you're concerned 
about the lifetime of the original string, you could copy that (once) into a 
`std::string` in the context. Because `header_rewrite` doesn't have recursive 
config structures you have no concerns about state preservation across nested 
comparisons which means you probably don't need to deal with working vs. active 
match data. Rather than the `ArgPack` and `bwprint` you could have a method 
like `std::string_view group(int idx)` which returns a view of the capture 
group.
   
   Overall it might look something like
   ```
   class Context {
   protected:
   int ovector[OVECOUNT] _rxp_active;
   std::string_view _rxp_src;
   public:
   std::string_view rxp_group(int idx) {
     idx *- 2;
     return _rxp_src.substr(_rxp_active[idx], _rxp_active[idx+1] - 
_rxp-active[idx]);
   }
   // Plus appropriate constructors, etc.
   };
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to