Re: Overload str.replace to take a Map?

Mathias Bynens Sat, 19 May 2018 16:05:25 -0700

Hey Kai, you’re oversimplifying. Your solution works for a single Unicode
symbol (corresponding to a single code point) but falls apart as soon as
you need to match multiple symbols of possibly varying length, like in the
`escapeHtml` example.


On Sat, May 19, 2018 at 8:43 AM, kai zhu <[email protected]> wrote:

> again, you backend-engineers are making something more complicated than
> needs be, when simple, throwaway glue-code will suffice.  agree with
> jordan, this feature is a needless cross-cut of String.prototype.replace.
>
> ```
> /*jslint
>     node: true
> */
> 'use strict';
> var dict;
> dict = {
>     '$': '^',
>     '1': '2',
>     '<': '&lt;',
>     '🍌': '🍑',
>     '-': '_',
>     ']': '@'
> };
> // output: "test🍐🍑_^^[22@ &lt;foo>"
> console.log('test🍐🍌-$$[11] <foo>'.replace((/[\S\s]/gu), function
> (character) {
>     return dict.hasOwnProperty(character)
>         ? dict[character]
>         : character;
> }));
> ```
>
> kai zhu
> [email protected]
>
>
>
> On 19 May 2018, at 4:08 PM, Cyril Auburtin <[email protected]>
> wrote:
>
> You can also have a
>
> ```js
> var replacer = replacements => {
>   const re = new RegExp(replacements.map(([k,_,escaped=k]) =>
> escaped).join('|'), 'gu');
>   const replaceMap = new Map(replacements);
>   return s => s.replace(re, w => replaceMap.get(w));
> }
> var replace = replacer([['$', '^', String.raw`\$`], ['1', '2'], ['<',
> '&lt;'], ['🍌', '🍑'], ['-', '_'], [']', '@', String.raw`\]`]]);
> replace('test🍐🍌-$$[11] <foo>') // "test🍐🍑_^^[22@ &lt;foo>"
> ```
> but it's quickly messy to work with escaping
>
> Le sam. 19 mai 2018 à 08:17, Isiah Meadows <[email protected]> a
> écrit :
>
>> Here's what I'd prefer instead: overload `String.prototype.replace` to
>> take non-callable objects, as sugar for this:
>>
>> ```js
>> const old = Function.call.bind(Function.call, String.prototype.replace)
>> String.prototype.replace = function (regexp, object) {
>>     if (object == null && regexp != null && typeof regexp === "object") {
>>         const re = new RegExp(
>>             Object.keys(regexp)
>>             .map(key => `${old(key, /[\\^$*+?.()|[\]{}]/g, '\\$&')}`)
>>             .join("|")
>>         )
>>         return old(this, re, m => object[m])
>>     } else {
>>         return old(this, regexp, object)
>>     }
>> }
>> ```
>>
>> This would cover about 99% of my use for something like this, with
>> less runtime overhead (that of not needing to check for and
>> potentially match multiple regular expressions at runtime) and better
>> static analyzability (you only need to check it's an object literal or
>> constant frozen object, not that it's argument is the result of the
>> built-in `Map` call). It's exceptionally difficult to optimize for
>> this unless you know everything's a string, but most cases where I had
>> to pass a callback that wasn't super complex looked a lot like this:
>>
>> ```js
>> // What I use:
>> function escapeHTML(str) {
>>     return str.replace(/["'&<>]/g, m => {
>>         switch (m) {
>>         case '"': return "&#34;"
>>         case "'": return "&#39;"
>>         case "&": return "&amp;"
>>         case "<": return "&lt;"
>>         case ">": return "&gt;"
>>         default: throw new TypeError("unreachable")
>>         }
>>     })
>> }
>>
>> // What it could be
>> function escapeHTML(str) {
>>     return str.replace({
>>         '"': "&#34;",
>>         "'": "&#39;",
>>         "&": "&amp;",
>>         "<": "&lt;",
>>         ">": "&gt;",
>>     })
>> }
>> ```
>>
>> And yes, this enables optimizations engines couldn't easily produce
>> otherwise. In this instance, an engine could find that the object is
>> static with only single-character entries, and it could replace the
>> call to a fast-path one that relies on a cheap lookup table instead
>> (Unicode replacement would be similar, except you'd need an extra
>> layer of indirection with astrals to avoid blowing up memory when
>> generating these tables):
>>
>> ```js
>> // Original
>> function escapeHTML(str) {
>>     return str.replace({
>>         '"': "&#34;",
>>         "'": "&#39;",
>>         "&": "&amp;",
>>         "<": "&lt;",
>>         ">": "&gt;",
>>     })
>> }
>>
>> // Not real JS, but think of it as how an engine might implement this. The
>> // implementation of the runtime function `ReplaceWithLookupTable` is
>> omitted
>> // for brevity, but you could imagine how it could be implemented, given
>> the
>> // pseudo-TS signature:
>> //
>> // ```ts
>> // declare function %ReplaceWithLookupTable(
>> //     str: string,
>> //     table: string[]
>> // ): string
>> // ```
>> function escapeHTML(str) {
>>     static {
>>         // A zero-initialized array with 2^16 entries (U+0000-U+FFFF),
>> except
>>         // for the object's members. This takes up to about 70K per
>> instance,
>>         // but these are *far* more often called than created.
>>         const _lookup_escapeHTML = %calloc(65536)
>>
>>         _lookup_escapeHTML[34] = "&#34;"
>>         _lookup_escapeHTML[38] = "&amp;"
>>         _lookup_escapeHTML[39] = "&#39;"
>>         _lookup_escapeHTML[60] = "&gt;"
>>         _lookup_escapeHTML[62] = "&lt;"
>>     }
>>
>>     return %ReplaceWithLookupTable(str, _lookup_escapeHTML)
>> }
>> ```
>>
>> Likewise, similar, but more restrained, optimizations could be
>> performed on objects with multibyte strings, since they can be reduced
>> to a simple search trie. (These can be built in even the general case
>> if the strings are large enough to merit it - small ropes are pretty
>> cheap to create.)
>>
>> For what it's worth, there's precedent here in Ruby, which has support
>> for `Hash`es as `String#gsub` parameters which work similarly.
>>
>> -----
>>
>> Isiah Meadows
>> [email protected]
>> www.isiahmeadows.com
>>
>>
>> On Fri, May 18, 2018 at 1:01 PM, Logan Smyth <[email protected]>
>> wrote:
>> >> It wouldn't necessarily break existing API, since
>> String.prototype.replace
>> >> currently accepts only RegExp or strings.
>> >
>> > Not quite accurate. It accepts anything with a `Symbol.replace`
>> property, or
>> > a string.
>> >
>> > Given that, what you're describing can be implemented as
>> > ```
>> > Map.prototype[Symbol.replace] = function(str) {
>> >   for(const [key, value] of this) {
>> >     str = str.replace(key, value);
>> >   }
>> >   return str;
>> > };
>> > ```
>> >
>> >> I don't know if the ECMAScript spec mandates preserving a particular
>> order
>> >> to a Map's elements.
>> >
>> > It does, so you're good there.
>> >
>> >> Detecting collisions between matching regular expressions or strings.
>> >
>> > I think this would be my primary concern, but no so much ordering as
>> > expectations. Like if you did
>> > ```
>> > "1".replace(new Map([
>> >   ['1', '2'],
>> >   ['2', '3],
>> > ]);
>> > ```
>> > is the result `2` or `3`? `3` seems surprising to me, at least in the
>> > general sense, because there was no `2` in the original input, but it's
>> also
>> > hard to see how you'd spec the behavior to avoid that if general regex
>> > replacement is supported.
>> >
>> > On Fri, May 18, 2018 at 9:47 AM, Alex Vincent <[email protected]>
>> wrote:
>> >>
>> >> Reading [1] in the digests, I think there might actually be an API
>> >> improvement that is doable.
>> >>
>> >> Suppose the String.prototype.replace API allowed passing in a single
>> >> argument, a Map instance where the keys were strings or regular
>> expressions
>> >> and the values were replacement strings or functions.
>> >>
>> >> Advantages:
>> >> * Shorthand - instead of writing str.replace(a, b).replace(c,
>> >> d).replace(e, f)... you get str.replace(regExpMap)
>> >> * Reusable - the same regular expression/string map could be used for
>> >> several strings (assuming of course the user didn't just abstract the
>> call
>> >> into a separate function)
>> >> * Modifiable on demand - developers could easily add new regular
>> >> expression matches to the map object, or remove them
>> >> * It wouldn't necessarily break existing API, since
>> >> String.prototype.replace currently accepts only RegExp or strings.
>> >>
>> >> Disadvantages / reasons not to do it:
>> >> * Detecting collisions between matching regular expressions or strings.
>> >> If two regular expressions match the same string, or a regular
>> expression
>> >> and a search string match, the expected results may vary because a
>> Map's
>> >> elements might not be consistently ordered.  I don't know if the
>> ECMAScript
>> >> spec mandates preserving a particular order to a Map's elements.
>> >>   - if we preserve the same chaining capability
>> >> (str.replace(map1).replace(map2)...), this might not be a big problem.
>> >>
>> >> The question is, how often do people chain replace calls together?
>> >>
>> >> * It's not particularly hard to chain several replace calls together.
>> >> It's just verbose, which might not be a high enough burden to overcome
>> for
>> >> adding API.
>> >>
>> >> That's my two cents for the day.  Thoughts?
>> >>
>> >> [1] https://esdiscuss.org/topic/adding-map-directly-to-string-
>> prototype
>> >>
>> >> --
>> >> "The first step in confirming there is a bug in someone else's work is
>> >> confirming there are no bugs in your own."
>> >> -- Alexander J. Vincent, June 30, 2001
>> >>
>> >> _______________________________________________
>> >> es-discuss mailing list
>> >> [email protected]
>> >> https://mail.mozilla.org/listinfo/es-discuss
>> >>
>> >
>> >
>> > _______________________________________________
>> > es-discuss mailing list
>> > [email protected]
>> > https://mail.mozilla.org/listinfo/es-discuss
>> >
>> _______________________________________________
>> es-discuss mailing list
>> [email protected]
>> https://mail.mozilla.org/listinfo/es-discuss
>>
> _______________________________________________
> es-discuss mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/es-discuss
>
>
>
> _______________________________________________
> es-discuss mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/es-discuss
>
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Overload str.replace to take a Map?

Reply via email to