Hi Maxim, Maxim Cournoyer <[email protected]> writes:
> Mark H Weaver <[email protected]> writes: > >> With the changes suggested above, I would have no objection to pushing >> this to core-updates. However, it occurs to me that we could handle the >> NUL case in a better way: >> >> Since the C regex functions that we use cannot handle NUL bytes, we >> could use a different code point to represent NUL during those >> operations. We could choose a code point from one of the Unicode >> Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that >> does not occur in the string. >> >> Let NUL* be the code point which will represent NUL bytes. First >> replace all NULs with NUL*s, then perform the substitutions, and finally >> replace all ALT*s with NULs before writing to the output. > > Do I understand this transformation as NULs -> NUL*s and back from NUL*s > -> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation. Sorry, it's a typo. Where I wrote "ALT*s", I meant to write "NUL*s". >> What do you think? > > It raises the complexity level a bit for something which doesn't seem to > be a very common scenario, FWIW, I agree that it's not a common scenario, and it's not entirely clear that it was worth the time I spent on it, or the added complexity. On the other hand, I would dislike having a basic API like 'substitute*' be subtly broken in this way. > but otherwise seems a very elegant > workaround. It seems to me that your implementation is already pretty > complete. I'll try write a test for validating it and report back. Sounds good. Thank you! Mark
