Re: A grammar for quoted strings with escaped chars

Christopher Layne Mon, 22 Sep 2014 01:01:29 -0700

I've posted some things previously on this topic - but in short, you don't 
really need to use events to do this. It's possible to do it in a 
semi-straightforward fashion without a lot of jumping through hoops (just a 
bunch of rules).


Here's some grammar fragments demonstrating what I'm talking about (this 
handles single quoted, double quoted, and "quote-like" parsing (e.g. q%%, q||), 
while efficiently handling simple quoted strings that have no escape sequences 
but falling back to an escape-aware mode when they're present.

my ($dsl, $grammar) = 
<<'===================================================================================================';

:default ::= action => [values]
lexeme default = latm => 1

[...]

# Normal, bare, unquoted
value           ::= value_n
value_n         ::= valword_n

# Quoted but not escaped                                        # reassemble 
action
value           ::= value_qd                                    action => val_qd
                  | value_qs                                    action => val_qs
                  | value_ql0                                   action => 
val_ql0
                  | value_ql1                                   action => 
val_ql1
value_qd        ::= valword_qd
value_qs        ::= valword_qs
value_ql0       ::= valword_ql0
value_ql1       ::= valword_ql1

# Quoted and escaped                                            # reassemble 
action
value           ::= (g_quote_d) value_eqd (g_quote_d)           action => 
val_eqd
                  | (g_quote_s) value_eqs (g_quote_s)           action => 
val_eqs
                  | (g_quote_ls0) value_eql0 (g_quote_le0)      action => 
val_eql0
                  | (g_quote_ls1) value_eql1 (g_quote_le1)      action => 
val_eql1
value_eqd       ::= valword_eqd*
value_eqs       ::= valword_eqs*
value_eql0      ::= valword_eql0*
value_eql1      ::= valword_eql1*

# Normal, bare, unquoted
valword_n         ~ valword_n_c
valword_n_c       ~ [\w_\@:.\/\*-]+

# Quoted but not escaped
valword_qd        ~ quote_d valword_qd_c quote_d
valword_qs        ~ quote_s valword_qs_c quote_s
valword_ql0       ~ quote_ls0 valword_ql0_c quote_le0
valword_ql1       ~ quote_ls1 valword_ql1_c quote_le1
valword_qd_c      ~ [^"\\]*
valword_qs_c      ~ [^'\\]*
valword_ql0_c     ~ [^|\\]*
valword_ql1_c     ~ [^%\\]*

# Quoted and escaped
valword_eqd       ~ valword_eqd_c
valword_eqs       ~ valword_eqs_c
valword_eql0      ~ valword_eql0_c
valword_eql1      ~ valword_eql1_c
valword_eqd_c     ~ [^"] | whitespace | escape ["]
valword_eqs_c     ~ [^'] | whitespace | escape [']
valword_eql0_c    ~ [^|] | whitespace | escape [|]
valword_eql1_c    ~ [^%] | whitespace | escape [%]

# These do translation, but cannot be enabled yet as the expectation is no 
translation.
# valword_eqd     ~ [^\a\b\e\f\r\n\t\\"] | whitespace | escape valword_esc
# valword_eqs     ~ [^\a\b\e\f\r\n\t\\'] | whitespace | escape valword_esc
# valword_esc     ~ [abefrnt\\"']

# The same base lexemes cannot be directly used by both the lexer and grammar 
*at the same time*.
# Work around it by providing wrapper lexeme rules for the grammar which end up 
at the same terminal.
g_quote_d         ~ quote_d
g_quote_s         ~ quote_s
g_quote_ls0       ~ quote_ls0
g_quote_le0       ~ quote_le0
g_quote_ls1       ~ quote_ls1
g_quote_le1       ~ quote_le1

quote_d           ~ ["]
quote_s           ~ [']
quote_ls0         ~ 'q|'
quote_le0         ~ '|'
quote_ls1         ~ 'q%'
quote_le1         ~ '%'
escape            ~ '\'

:discard          ~ whitespace
whitespace        ~ [\s]+
===================================================================================================

# Deescaping table
my $xtab = {
         'eqd' => { q(\") => qq(") },
         'eqs' => { q(\') => qq(') },
        'eql0' => { q(\|) => qq(|) },
        'eql1' => { q(\%) => qq(%) },

#       # Not presently used.
#       'eqx'  => {
#               q(\a)   => qq(\a),
#               q(\b)   => qq(\b),
#               q(\e)   => qq(\e),
#               q(\f)   => qq(\f),
#               q(\n)   => qq(\n),
#               q(\r)   => qq(\r),
#               q(\t)   => qq(\t),
#               q(\")   => qq("),
#               q(\')   => qq('),
#               q(\\\\) => qq(\\),
#       },
};

# Deescaping functions
sub val_eqd  { return [ join '', map +($xtab->{'eqd'}{$_} || $_), @{$_[1]} ] }
sub val_eqs  { return [ join '', map +($xtab->{'eqs'}{$_} || $_), @{$_[1]} ] }
sub val_eql0 { return [ join '', map +($xtab->{'eql0'}{$_} || $_), @{$_[1]} ] }
sub val_eql1 { return [ join '', map +($xtab->{'eql1'}{$_} || $_), @{$_[1]} ] }
#sub val_eqx  { return [ join '', map +($xtab->{'eqx'}{$_} || $_), @{$_[1]} ] }

# Dequoting functions
sub val_qd  { return [ substr($_[1]->[0], 1, -1) ] }
sub val_qs  { return [ substr($_[1]->[0], 1, -1) ] }
sub val_ql0 { return [ substr($_[1]->[0], 2, -1) ] }
sub val_ql1 { return [ substr($_[1]->[0], 2, -1) ] }



The "deescape anything back to it's original form" isn't used in the above, but 
simply commented out. The approach would be the same.

-cl

On Sep 22, 2014, at 0012 PT, Ron Savage <[email protected]> wrote:

> I've developed a grammar (with help from various people of course) for quoted 
> strings: http://scsys.co.uk:8002/424926
> 
> Requirements:
> 
> o Strings must be quoted
> 
> o Strings are either single or double quoted
> 
> o The escape character is \
> 
> o If the string is single quoted, internal single quotes must be escaped
> 
> o If the string is double quoted, internal double quotes must be escaped
> 
> o Any other character may be escaped
> 
> o If a character is escaped, the escape character is preserved in the output
> 
> o Empty strings are accepted

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: A grammar for quoted strings with escaped chars

Reply via email to