Hi, I've pushed an initial change: https://gerrit.cloudera.org/#/c/8900/ The change contains essential feature only: - Function name: regexp_escape - Takes a string as a input parameter and returns a string which is escaped. - Escapes the following special characters: ".*\\+?^[](){}$!=:-#\n\r\t\v " (not contain double quote. the use of double quotes is not to hide a space.)
Best regards, Jinchul 2017-12-19 11:12 GMT+09:00 Jin Chul Kim <[email protected]>: > Hi, > > I would like to discuss some issues before taking the ticket which expects > a new builtin function(e.g. string regex_escape(string_pattern)). The > purpose of the function is to escape a set of special characters by > replacing the string pattern with their escaped characters. > > 1. Define candidates of escaped characters > When I research the escape on other languages, interestingly there are > some differences and features in each language. > > We should set our escaped characters. Here is a summary of the above > discussion: > > - Perl: Escapes every character that is not alphanumeric(i.e. > [A-Za-z_0-9]). > - PHP: Escapes the following special characters: . \ + * ? [ ^ ] $ ( ) { } > = ! < > | : - > - Python: Same as Perl's approach, but the character underscore is no > longer escaped since version 3.3. > - Ruby: Escapes the following special characters: [ ] { } ( ) | - * . \ ? > + ^ $ # > Ruby Escapes comments(#), but do not escape context sensitive characters(: > <) > - Java: A different approach. Java relies on "as if it were a literal > pattern" by "\Q" and "\E" > - C#: Escapes the following special characters: \ * + ? | { [ ( ) ^ $ . # > whitespace > C# does not escapes ] and }. > > See the discussion if you want to see more details: https://github.com/ > benjamingr/RegExp.escape/blob/master/data/other_languages/discussions.md > > 2. Built-in function name > The reporter proposed "regex_escape". I think the function name is > intuitive and self-explainable. Please suggest if you have any better name. > > 3. Signature of the built-in function > Do we have to extend function signature? I guess an user may want to pass > a set of customized characters. > > regex_escape(string_pattern, [delimiter]) > > delimiter > := "^[A-Za-z0-9]" > | "[.\?\[^()\]{}=!<>|:-]" > > "^[A-Za-z0-9]" means "escapes non-alphanumeric characters" > "[.\?\[^()\]{}=!<>|:-]" means "escapes the specified characters" > In delimiter, the following characters should be escaped: [] > > Best regards, > Jinchul > >
