bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))

Mark H Weaver Sat, 16 Jun 2018 21:39:01 -0700

Hi Maxim,

Maxim Cournoyer <[email protected]> writes:


> Mark H Weaver <[email protected]> writes:
>
>> With the changes suggested above, I would have no objection to pushing
>> this to core-updates.  However, it occurs to me that we could handle the
>> NUL case in a better way:
>>
>> Since the C regex functions that we use cannot handle NUL bytes, we
>> could use a different code point to represent NUL during those
>> operations.  We could choose a code point from one of the Unicode
>> Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that
>> does not occur in the string.
>>
>> Let NUL* be the code point which will represent NUL bytes.  First
>> replace all NULs with NUL*s, then perform the substitutions, and finally
>> replace all ALT*s with NULs before writing to the output.
>
> Do I understand this transformation as NULs -> NUL*s and back from NUL*s
> -> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation.

Sorry, it's a typo.  Where I wrote "ALT*s", I meant to write "NUL*s".

>> What do you think?
>
> It raises the complexity level a bit for something which doesn't seem to
> be a very common scenario,

FWIW, I agree that it's not a common scenario, and it's not entirely
clear that it was worth the time I spent on it, or the added complexity.
On the other hand, I would dislike having a basic API like 'substitute*'
be subtly broken in this way.

> but otherwise seems a very elegant
> workaround. It seems to me that your implementation is already pretty
> complete. I'll try write a test for validating it and report back.

Sounds good.  Thank you!

      Mark

bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates))

Reply via email to