On 1/8/24 1:37 PM, Yann Ylavic wrote:
> On Mon, Jan 8, 2024 at 10:49 AM Ruediger Pluem <rpl...@apache.org> wrote:
>>
>> On 1/5/24 3:08 PM, Yann Ylavic wrote:
>>>
>>> process_regexp.diff
>>>
>>> Index: modules/metadata/mod_headers.c
>>> ===================================================================
>>> --- modules/metadata/mod_headers.c (revision 1914804)
>>> +++ modules/metadata/mod_headers.c (working copy)
>>> @@ -622,41 +622,70 @@ static char* process_tags(header_entry *hdr, reque
>>> }
>>> return str ? str : "";
>>> }
>>> -static const char *process_regexp(header_entry *hdr, const char *value,
>>> - request_rec *r)
>>> +
>>> +static const char *process_regexp_rec(header_entry *hdr, const char *value,
>>> + request_rec *r, ap_regmatch_t
>>> pmatch[],
>>> + int flags)
>>> {
>>> - ap_regmatch_t pmatch[AP_MAX_REG_MATCH];
>>> - const char *subs;
>>> - const char *remainder;
>>> char *ret;
>>> - int diffsz;
>>> - if (ap_regexec(hdr->regex, value, AP_MAX_REG_MATCH, pmatch, 0)) {
>>> + const char *subs, *remainder;
>>> + apr_size_t val_len, subs_len, rem_len;
>>> + apr_size_t match_start, match_end;
>>> +
>>> + val_len = strlen(value);
>>> + if (ap_regexec_len(hdr->regex, value, val_len,
>>> + AP_MAX_REG_MATCH, pmatch,
>>> + flags)) {
>>> /* no match, nothing to do */
>>> return value;
>>> }
>>> +
>>> /* Process tags in the input string rather than the resulting
>>> * substitution to avoid surprises
>>> */
>>> - subs = ap_pregsub(r->pool, process_tags(hdr, r), value,
>>> AP_MAX_REG_MATCH, pmatch);
>>> + subs = ap_pregsub(r->pool, process_tags(hdr, r), value,
>>> + AP_MAX_REG_MATCH, pmatch);
>>> if (subs == NULL)
>>> return NULL;
>>> - diffsz = strlen(subs) - (pmatch[0].rm_eo - pmatch[0].rm_so);
>>> + subs_len = strlen(subs);
>>> +
>>> + ap_assert(pmatch[0].rm_so >= 0 && pmatch[0].rm_so <= pmatch[0].rm_eo);
>>> + match_start = pmatch[0].rm_so;
>>> + match_end = pmatch[0].rm_eo;
>>> if (hdr->action == hdr_edit) {
>>> - remainder = value + pmatch[0].rm_eo;
>>> + remainder = value + match_end;
>>> }
>>> else { /* recurse to edit multiple matches if applicable */
>>> - remainder = process_regexp(hdr, value + pmatch[0].rm_eo, r);
>>> - if (remainder == NULL)
>>> - return NULL;
>>> - diffsz += strlen(remainder) - strlen(value + pmatch[0].rm_eo);
>>> + if (match_end == 0 && val_len) {
>>> + /* advance on empty match to avoid infinite recursion */
>>> + match_start = match_end = 1;
>>
>> Not debating that
>>
>> Header edit* headername ^ prefix
>>
>> should be be
>>
>> Header edit headername ^ prefix
>>
>> but wouldn't that do the wrong thing and add the prefix *after* the first
>> character of the header value?
>
> Yes correct, this patch won't set the skipped char at the right place finally.
> New v2 attached.
>
>>
>>> + }
>>> + if (match_end < val_len) {
>>> + remainder = process_regexp_rec(hdr, value + match_end,
>>
>> Shouldn't we increase match_end just here by 1 if match_end == 0 && val_len ?
>
> See v2, match_end is not used below anyway.
>
>
>>
>>> + r, pmatch, AP_REG_NOTBOL);
>
> As noted in v2 we have an issue here by "losing" the beginning of the
> value on recursion:
> /* XXX: recursing by using AP_REG_NOTBOL (because we are not at ^
> * anymore) and then "losing" the beginning of the string is not
> * always correct. Say we match "(?<=a)ba" against "ababa", on
> * recursion ap_regexec_len() will not know that the second "b" is
> * preceded by "a" thus not match. We'd need a new ap_regexec_ex()
> * that can take match_end as an offset to fix this..
> */
>
> Not sure how far we should go with this patch..
I think things do not get worse in this respect because of this patch but only
improve
in the sense that an infinite recursion is avoided.
Hence +1 on the patch.
Regards
Rüdiger