Re: Overload str.replace to take a Map?

Isiah Meadows Fri, 18 May 2018 23:18:07 -0700

Here's what I'd prefer instead: overload `String.prototype.replace` to
take non-callable objects, as sugar for this:


```js
const old = Function.call.bind(Function.call, String.prototype.replace)
String.prototype.replace = function (regexp, object) {
    if (object == null && regexp != null && typeof regexp === "object") {
        const re = new RegExp(
            Object.keys(regexp)
            .map(key => `${old(key, /[\\^$*+?.()|[\]{}]/g, '\\$&')}`)
            .join("|")
        )
        return old(this, re, m => object[m])
    } else {
        return old(this, regexp, object)
    }
}
```

This would cover about 99% of my use for something like this, with
less runtime overhead (that of not needing to check for and
potentially match multiple regular expressions at runtime) and better
static analyzability (you only need to check it's an object literal or
constant frozen object, not that it's argument is the result of the
built-in `Map` call). It's exceptionally difficult to optimize for
this unless you know everything's a string, but most cases where I had
to pass a callback that wasn't super complex looked a lot like this:

```js
// What I use:
function escapeHTML(str) {
    return str.replace(/["'&<>]/g, m => {
        switch (m) {
        case '"': return "&#34;"
        case "'": return "&#39;"
        case "&": return "&amp;"
        case "<": return "&lt;"
        case ">": return "&gt;"
        default: throw new TypeError("unreachable")
        }
    })
}

// What it could be
function escapeHTML(str) {
    return str.replace({
        '"': "&#34;",
        "'": "&#39;",
        "&": "&amp;",
        "<": "&lt;",
        ">": "&gt;",
    })
}
```

And yes, this enables optimizations engines couldn't easily produce
otherwise. In this instance, an engine could find that the object is
static with only single-character entries, and it could replace the
call to a fast-path one that relies on a cheap lookup table instead
(Unicode replacement would be similar, except you'd need an extra
layer of indirection with astrals to avoid blowing up memory when
generating these tables):

```js
// Original
function escapeHTML(str) {
    return str.replace({
        '"': "&#34;",
        "'": "&#39;",
        "&": "&amp;",
        "<": "&lt;",
        ">": "&gt;",
    })
}

// Not real JS, but think of it as how an engine might implement this. The
// implementation of the runtime function `ReplaceWithLookupTable` is omitted
// for brevity, but you could imagine how it could be implemented, given the
// pseudo-TS signature:
//
// ```ts
// declare function %ReplaceWithLookupTable(
//     str: string,
//     table: string[]
// ): string
// ```
function escapeHTML(str) {
    static {
        // A zero-initialized array with 2^16 entries (U+0000-U+FFFF), except
        // for the object's members. This takes up to about 70K per instance,
        // but these are *far* more often called than created.
        const _lookup_escapeHTML = %calloc(65536)

        _lookup_escapeHTML[34] = "&#34;"
        _lookup_escapeHTML[38] = "&amp;"
        _lookup_escapeHTML[39] = "&#39;"
        _lookup_escapeHTML[60] = "&gt;"
        _lookup_escapeHTML[62] = "&lt;"
    }

    return %ReplaceWithLookupTable(str, _lookup_escapeHTML)
}
```

Likewise, similar, but more restrained, optimizations could be
performed on objects with multibyte strings, since they can be reduced
to a simple search trie. (These can be built in even the general case
if the strings are large enough to merit it - small ropes are pretty
cheap to create.)

For what it's worth, there's precedent here in Ruby, which has support
for `Hash`es as `String#gsub` parameters which work similarly.

-----

Isiah Meadows
[email protected]
www.isiahmeadows.com


On Fri, May 18, 2018 at 1:01 PM, Logan Smyth <[email protected]> wrote:
>> It wouldn't necessarily break existing API, since String.prototype.replace
>> currently accepts only RegExp or strings.
>
> Not quite accurate. It accepts anything with a `Symbol.replace` property, or
> a string.
>
> Given that, what you're describing can be implemented as
> ```
> Map.prototype[Symbol.replace] = function(str) {
>   for(const [key, value] of this) {
>     str = str.replace(key, value);
>   }
>   return str;
> };
> ```
>
>> I don't know if the ECMAScript spec mandates preserving a particular order
>> to a Map's elements.
>
> It does, so you're good there.
>
>> Detecting collisions between matching regular expressions or strings.
>
> I think this would be my primary concern, but no so much ordering as
> expectations. Like if you did
> ```
> "1".replace(new Map([
>   ['1', '2'],
>   ['2', '3],
> ]);
> ```
> is the result `2` or `3`? `3` seems surprising to me, at least in the
> general sense, because there was no `2` in the original input, but it's also
> hard to see how you'd spec the behavior to avoid that if general regex
> replacement is supported.
>
> On Fri, May 18, 2018 at 9:47 AM, Alex Vincent <[email protected]> wrote:
>>
>> Reading [1] in the digests, I think there might actually be an API
>> improvement that is doable.
>>
>> Suppose the String.prototype.replace API allowed passing in a single
>> argument, a Map instance where the keys were strings or regular expressions
>> and the values were replacement strings or functions.
>>
>> Advantages:
>> * Shorthand - instead of writing str.replace(a, b).replace(c,
>> d).replace(e, f)... you get str.replace(regExpMap)
>> * Reusable - the same regular expression/string map could be used for
>> several strings (assuming of course the user didn't just abstract the call
>> into a separate function)
>> * Modifiable on demand - developers could easily add new regular
>> expression matches to the map object, or remove them
>> * It wouldn't necessarily break existing API, since
>> String.prototype.replace currently accepts only RegExp or strings.
>>
>> Disadvantages / reasons not to do it:
>> * Detecting collisions between matching regular expressions or strings.
>> If two regular expressions match the same string, or a regular expression
>> and a search string match, the expected results may vary because a Map's
>> elements might not be consistently ordered.  I don't know if the ECMAScript
>> spec mandates preserving a particular order to a Map's elements.
>>   - if we preserve the same chaining capability
>> (str.replace(map1).replace(map2)...), this might not be a big problem.
>>
>> The question is, how often do people chain replace calls together?
>>
>> * It's not particularly hard to chain several replace calls together.
>> It's just verbose, which might not be a high enough burden to overcome for
>> adding API.
>>
>> That's my two cents for the day.  Thoughts?
>>
>> [1] https://esdiscuss.org/topic/adding-map-directly-to-string-prototype
>>
>> --
>> "The first step in confirming there is a bug in someone else's work is
>> confirming there are no bugs in your own."
>> -- Alexander J. Vincent, June 30, 2001
>>
>> _______________________________________________
>> es-discuss mailing list
>> [email protected]
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
>
> _______________________________________________
> es-discuss mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/es-discuss
>
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Overload str.replace to take a Map?

Reply via email to