I've posted some things previously on this topic - but in short, you don't
really need to use events to do this. It's possible to do it in a
semi-straightforward fashion without a lot of jumping through hoops (just a
bunch of rules).
Here's some grammar fragments demonstrating what I'm talking about (this
handles single quoted, double quoted, and "quote-like" parsing (e.g. q%%, q||),
while efficiently handling simple quoted strings that have no escape sequences
but falling back to an escape-aware mode when they're present.
my ($dsl, $grammar) =
<<'===================================================================================================';
:default ::= action => [values]
lexeme default = latm => 1
[...]
# Normal, bare, unquoted
value ::= value_n
value_n ::= valword_n
# Quoted but not escaped # reassemble
action
value ::= value_qd action => val_qd
| value_qs action => val_qs
| value_ql0 action =>
val_ql0
| value_ql1 action =>
val_ql1
value_qd ::= valword_qd
value_qs ::= valword_qs
value_ql0 ::= valword_ql0
value_ql1 ::= valword_ql1
# Quoted and escaped # reassemble
action
value ::= (g_quote_d) value_eqd (g_quote_d) action =>
val_eqd
| (g_quote_s) value_eqs (g_quote_s) action =>
val_eqs
| (g_quote_ls0) value_eql0 (g_quote_le0) action =>
val_eql0
| (g_quote_ls1) value_eql1 (g_quote_le1) action =>
val_eql1
value_eqd ::= valword_eqd*
value_eqs ::= valword_eqs*
value_eql0 ::= valword_eql0*
value_eql1 ::= valword_eql1*
# Normal, bare, unquoted
valword_n ~ valword_n_c
valword_n_c ~ [\w_\@:.\/\*-]+
# Quoted but not escaped
valword_qd ~ quote_d valword_qd_c quote_d
valword_qs ~ quote_s valword_qs_c quote_s
valword_ql0 ~ quote_ls0 valword_ql0_c quote_le0
valword_ql1 ~ quote_ls1 valword_ql1_c quote_le1
valword_qd_c ~ [^"\\]*
valword_qs_c ~ [^'\\]*
valword_ql0_c ~ [^|\\]*
valword_ql1_c ~ [^%\\]*
# Quoted and escaped
valword_eqd ~ valword_eqd_c
valword_eqs ~ valword_eqs_c
valword_eql0 ~ valword_eql0_c
valword_eql1 ~ valword_eql1_c
valword_eqd_c ~ [^"] | whitespace | escape ["]
valword_eqs_c ~ [^'] | whitespace | escape [']
valword_eql0_c ~ [^|] | whitespace | escape [|]
valword_eql1_c ~ [^%] | whitespace | escape [%]
# These do translation, but cannot be enabled yet as the expectation is no
translation.
# valword_eqd ~ [^\a\b\e\f\r\n\t\\"] | whitespace | escape valword_esc
# valword_eqs ~ [^\a\b\e\f\r\n\t\\'] | whitespace | escape valword_esc
# valword_esc ~ [abefrnt\\"']
# The same base lexemes cannot be directly used by both the lexer and grammar
*at the same time*.
# Work around it by providing wrapper lexeme rules for the grammar which end up
at the same terminal.
g_quote_d ~ quote_d
g_quote_s ~ quote_s
g_quote_ls0 ~ quote_ls0
g_quote_le0 ~ quote_le0
g_quote_ls1 ~ quote_ls1
g_quote_le1 ~ quote_le1
quote_d ~ ["]
quote_s ~ [']
quote_ls0 ~ 'q|'
quote_le0 ~ '|'
quote_ls1 ~ 'q%'
quote_le1 ~ '%'
escape ~ '\'
:discard ~ whitespace
whitespace ~ [\s]+
===================================================================================================
# Deescaping table
my $xtab = {
'eqd' => { q(\") => qq(") },
'eqs' => { q(\') => qq(') },
'eql0' => { q(\|) => qq(|) },
'eql1' => { q(\%) => qq(%) },
# # Not presently used.
# 'eqx' => {
# q(\a) => qq(\a),
# q(\b) => qq(\b),
# q(\e) => qq(\e),
# q(\f) => qq(\f),
# q(\n) => qq(\n),
# q(\r) => qq(\r),
# q(\t) => qq(\t),
# q(\") => qq("),
# q(\') => qq('),
# q(\\\\) => qq(\\),
# },
};
# Deescaping functions
sub val_eqd { return [ join '', map +($xtab->{'eqd'}{$_} || $_), @{$_[1]} ] }
sub val_eqs { return [ join '', map +($xtab->{'eqs'}{$_} || $_), @{$_[1]} ] }
sub val_eql0 { return [ join '', map +($xtab->{'eql0'}{$_} || $_), @{$_[1]} ] }
sub val_eql1 { return [ join '', map +($xtab->{'eql1'}{$_} || $_), @{$_[1]} ] }
#sub val_eqx { return [ join '', map +($xtab->{'eqx'}{$_} || $_), @{$_[1]} ] }
# Dequoting functions
sub val_qd { return [ substr($_[1]->[0], 1, -1) ] }
sub val_qs { return [ substr($_[1]->[0], 1, -1) ] }
sub val_ql0 { return [ substr($_[1]->[0], 2, -1) ] }
sub val_ql1 { return [ substr($_[1]->[0], 2, -1) ] }
The "deescape anything back to it's original form" isn't used in the above, but
simply commented out. The approach would be the same.
-cl
On Sep 22, 2014, at 0012 PT, Ron Savage <[email protected]> wrote:
> I've developed a grammar (with help from various people of course) for quoted
> strings: http://scsys.co.uk:8002/424926
>
> Requirements:
>
> o Strings must be quoted
>
> o Strings are either single or double quoted
>
> o The escape character is \
>
> o If the string is single quoted, internal single quotes must be escaped
>
> o If the string is double quoted, internal double quotes must be escaped
>
> o Any other character may be escaped
>
> o If a character is escaped, the escape character is preserved in the output
>
> o Empty strings are accepted
--
You received this message because you are subscribed to the Google Groups
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.