Re: Overload str.replace to take a Map?

Isiah Meadows Sun, 20 May 2018 03:33:13 -0700

@Mathias

My partcular `escapeHTML` example *could* be written like that (and it *is*
somewhat in the prose). But you're right that in the prose, I did bring up
the potential for things like `str.replace({cheese: "cake", ham: "eggs"})`.


@Kai

Have you ever tried writing an HTML template system on the front end? This
*will* almost inevitably come up, and most of my use cases for this is on
the front end itself handling various scenarios.

@Cyril

And every single one of those patterns is going to need compiled and
executed, and compiling and interpreting regular expressions is definitely
not quick, especially when you can nest Kleene stars. (See:
https://en.wikipedia.org/wiki/Regular_expression#Implementations_and_running_times)
That's why I'm against it - we don't need to complicate this proposal with
that mess.

-----

Isiah Meadows
[email protected]
www.isiahmeadows.com

On Sat, May 19, 2018 at 7:04 PM, Mathias Bynens <[email protected]> wrote:

> Hey Kai, you’re oversimplifying. Your solution works for a single Unicode
> symbol (corresponding to a single code point) but falls apart as soon as
> you need to match multiple symbols of possibly varying length, like in the
> `escapeHtml` example.
>
> On Sat, May 19, 2018 at 8:43 AM, kai zhu <[email protected]> wrote:
>
>> again, you backend-engineers are making something more complicated than
>> needs be, when simple, throwaway glue-code will suffice.  agree with
>> jordan, this feature is a needless cross-cut of String.prototype.replace.
>>
>> ```
>> /*jslint
>>     node: true
>> */
>> 'use strict';
>> var dict;
>> dict = {
>>     '$': '^',
>>     '1': '2',
>>     '<': '&lt;',
>>     '🍌': '🍑',
>>     '-': '_',
>>     ']': '@'
>> };
>> // output: "test🍐🍑_^^[22@ &lt;foo>"
>> console.log('test🍐🍌-$$[11] <foo>'.replace((/[\S\s]/gu), function
>> (character) {
>>     return dict.hasOwnProperty(character)
>>         ? dict[character]
>>         : character;
>> }));
>> ```
>>
>> kai zhu
>> [email protected]
>>
>>
>>
>> On 19 May 2018, at 4:08 PM, Cyril Auburtin <[email protected]>
>> wrote:
>>
>> You can also have a
>>
>> ```js
>> var replacer = replacements => {
>>   const re = new RegExp(replacements.map(([k,_,escaped=k]) =>
>> escaped).join('|'), 'gu');
>>   const replaceMap = new Map(replacements);
>>   return s => s.replace(re, w => replaceMap.get(w));
>> }
>> var replace = replacer([['$', '^', String.raw`\$`], ['1', '2'], ['<',
>> '&lt;'], ['🍌', '🍑'], ['-', '_'], [']', '@', String.raw`\]`]]);
>> replace('test🍐🍌-$$[11] <foo>') // "test🍐🍑_^^[22@ &lt;foo>"
>> ```
>> but it's quickly messy to work with escaping
>>
>> Le sam. 19 mai 2018 à 08:17, Isiah Meadows <[email protected]> a
>> écrit :
>>
>>> Here's what I'd prefer instead: overload `String.prototype.replace` to
>>> take non-callable objects, as sugar for this:
>>>
>>> ```js
>>> const old = Function.call.bind(Function.call, String.prototype.replace)
>>> String.prototype.replace = function (regexp, object) {
>>>     if (object == null && regexp != null && typeof regexp === "object") {
>>>         const re = new RegExp(
>>>             Object.keys(regexp)
>>>             .map(key => `${old(key, /[\\^$*+?.()|[\]{}]/g, '\\$&')}`)
>>>             .join("|")
>>>         )
>>>         return old(this, re, m => object[m])
>>>     } else {
>>>         return old(this, regexp, object)
>>>     }
>>> }
>>> ```
>>>
>>> This would cover about 99% of my use for something like this, with
>>> less runtime overhead (that of not needing to check for and
>>> potentially match multiple regular expressions at runtime) and better
>>> static analyzability (you only need to check it's an object literal or
>>> constant frozen object, not that it's argument is the result of the
>>> built-in `Map` call). It's exceptionally difficult to optimize for
>>> this unless you know everything's a string, but most cases where I had
>>> to pass a callback that wasn't super complex looked a lot like this:
>>>
>>> ```js
>>> // What I use:
>>> function escapeHTML(str) {
>>>     return str.replace(/["'&<>]/g, m => {
>>>         switch (m) {
>>>         case '"': return "&#34;"
>>>         case "'": return "&#39;"
>>>         case "&": return "&amp;"
>>>         case "<": return "&lt;"
>>>         case ">": return "&gt;"
>>>         default: throw new TypeError("unreachable")
>>>         }
>>>     })
>>> }
>>>
>>> // What it could be
>>> function escapeHTML(str) {
>>>     return str.replace({
>>>         '"': "&#34;",
>>>         "'": "&#39;",
>>>         "&": "&amp;",
>>>         "<": "&lt;",
>>>         ">": "&gt;",
>>>     })
>>> }
>>> ```
>>>
>>> And yes, this enables optimizations engines couldn't easily produce
>>> otherwise. In this instance, an engine could find that the object is
>>> static with only single-character entries, and it could replace the
>>> call to a fast-path one that relies on a cheap lookup table instead
>>> (Unicode replacement would be similar, except you'd need an extra
>>> layer of indirection with astrals to avoid blowing up memory when
>>> generating these tables):
>>>
>>> ```js
>>> // Original
>>> function escapeHTML(str) {
>>>     return str.replace({
>>>         '"': "&#34;",
>>>         "'": "&#39;",
>>>         "&": "&amp;",
>>>         "<": "&lt;",
>>>         ">": "&gt;",
>>>     })
>>> }
>>>
>>> // Not real JS, but think of it as how an engine might implement this.
>>> The
>>> // implementation of the runtime function `ReplaceWithLookupTable` is
>>> omitted
>>> // for brevity, but you could imagine how it could be implemented, given
>>> the
>>> // pseudo-TS signature:
>>> //
>>> // ```ts
>>> // declare function %ReplaceWithLookupTable(
>>> //     str: string,
>>> //     table: string[]
>>> // ): string
>>> // ```
>>> function escapeHTML(str) {
>>>     static {
>>>         // A zero-initialized array with 2^16 entries (U+0000-U+FFFF),
>>> except
>>>         // for the object's members. This takes up to about 70K per
>>> instance,
>>>         // but these are *far* more often called than created.
>>>         const _lookup_escapeHTML = %calloc(65536)
>>>
>>>         _lookup_escapeHTML[34] = "&#34;"
>>>         _lookup_escapeHTML[38] = "&amp;"
>>>         _lookup_escapeHTML[39] = "&#39;"
>>>         _lookup_escapeHTML[60] = "&gt;"
>>>         _lookup_escapeHTML[62] = "&lt;"
>>>     }
>>>
>>>     return %ReplaceWithLookupTable(str, _lookup_escapeHTML)
>>> }
>>> ```
>>>
>>> Likewise, similar, but more restrained, optimizations could be
>>> performed on objects with multibyte strings, since they can be reduced
>>> to a simple search trie. (These can be built in even the general case
>>> if the strings are large enough to merit it - small ropes are pretty
>>> cheap to create.)
>>>
>>> For what it's worth, there's precedent here in Ruby, which has support
>>> for `Hash`es as `String#gsub` parameters which work similarly.
>>>
>>> -----
>>>
>>> Isiah Meadows
>>> [email protected]
>>> www.isiahmeadows.com
>>>
>>>
>>> On Fri, May 18, 2018 at 1:01 PM, Logan Smyth <[email protected]>
>>> wrote:
>>> >> It wouldn't necessarily break existing API, since
>>> String.prototype.replace
>>> >> currently accepts only RegExp or strings.
>>> >
>>> > Not quite accurate. It accepts anything with a `Symbol.replace`
>>> property, or
>>> > a string.
>>> >
>>> > Given that, what you're describing can be implemented as
>>> > ```
>>> > Map.prototype[Symbol.replace] = function(str) {
>>> >   for(const [key, value] of this) {
>>> >     str = str.replace(key, value);
>>> >   }
>>> >   return str;
>>> > };
>>> > ```
>>> >
>>> >> I don't know if the ECMAScript spec mandates preserving a particular
>>> order
>>> >> to a Map's elements.
>>> >
>>> > It does, so you're good there.
>>> >
>>> >> Detecting collisions between matching regular expressions or strings.
>>> >
>>> > I think this would be my primary concern, but no so much ordering as
>>> > expectations. Like if you did
>>> > ```
>>> > "1".replace(new Map([
>>> >   ['1', '2'],
>>> >   ['2', '3],
>>> > ]);
>>> > ```
>>> > is the result `2` or `3`? `3` seems surprising to me, at least in the
>>> > general sense, because there was no `2` in the original input, but
>>> it's also
>>> > hard to see how you'd spec the behavior to avoid that if general regex
>>> > replacement is supported.
>>> >
>>> > On Fri, May 18, 2018 at 9:47 AM, Alex Vincent <[email protected]>
>>> wrote:
>>> >>
>>> >> Reading [1] in the digests, I think there might actually be an API
>>> >> improvement that is doable.
>>> >>
>>> >> Suppose the String.prototype.replace API allowed passing in a single
>>> >> argument, a Map instance where the keys were strings or regular
>>> expressions
>>> >> and the values were replacement strings or functions.
>>> >>
>>> >> Advantages:
>>> >> * Shorthand - instead of writing str.replace(a, b).replace(c,
>>> >> d).replace(e, f)... you get str.replace(regExpMap)
>>> >> * Reusable - the same regular expression/string map could be used for
>>> >> several strings (assuming of course the user didn't just abstract the
>>> call
>>> >> into a separate function)
>>> >> * Modifiable on demand - developers could easily add new regular
>>> >> expression matches to the map object, or remove them
>>> >> * It wouldn't necessarily break existing API, since
>>> >> String.prototype.replace currently accepts only RegExp or strings.
>>> >>
>>> >> Disadvantages / reasons not to do it:
>>> >> * Detecting collisions between matching regular expressions or
>>> strings.
>>> >> If two regular expressions match the same string, or a regular
>>> expression
>>> >> and a search string match, the expected results may vary because a
>>> Map's
>>> >> elements might not be consistently ordered.  I don't know if the
>>> ECMAScript
>>> >> spec mandates preserving a particular order to a Map's elements.
>>> >>   - if we preserve the same chaining capability
>>> >> (str.replace(map1).replace(map2)...), this might not be a big
>>> problem.
>>> >>
>>> >> The question is, how often do people chain replace calls together?
>>> >>
>>> >> * It's not particularly hard to chain several replace calls together.
>>> >> It's just verbose, which might not be a high enough burden to
>>> overcome for
>>> >> adding API.
>>> >>
>>> >> That's my two cents for the day.  Thoughts?
>>> >>
>>> >> [1] https://esdiscuss.org/topic/adding-map-directly-to-string-pr
>>> ototype
>>> >>
>>> >> --
>>> >> "The first step in confirming there is a bug in someone else's work is
>>> >> confirming there are no bugs in your own."
>>> >> -- Alexander J. Vincent, June 30, 2001
>>> >>
>>> >> _______________________________________________
>>> >> es-discuss mailing list
>>> >> [email protected]
>>> >> https://mail.mozilla.org/listinfo/es-discuss
>>> >>
>>> >
>>> >
>>> > _______________________________________________
>>> > es-discuss mailing list
>>> > [email protected]
>>> > https://mail.mozilla.org/listinfo/es-discuss
>>> >
>>> _______________________________________________
>>> es-discuss mailing list
>>> [email protected]
>>> https://mail.mozilla.org/listinfo/es-discuss
>>>
>> _______________________________________________
>> es-discuss mailing list
>> [email protected]
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> [email protected]
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>
> _______________________________________________
> es-discuss mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/es-discuss
>
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Overload str.replace to take a Map?

Reply via email to