On Fri, Nov 4, 2016 at 5:38 AM, Sven M. Hallberg <pe...@khjk.org> wrote:
> For one thing, it's harder to write a parser now that reuses an existing > JSON framework. Before you could do this: > > 1) Full recognition by an automaton autogenerated from a CFG. > (Also yay decidable equivalence.) > 2) Interpretation by existing JSON parser. > 3) Simple visitor pattern on result to convert tagged strings to their > native representations. > My understanding is parsers like Hammer can still handle these cases in one pass (I think?). Would love to know! Some quick BNF describing <member> and <tagged-string> according to: https://tjson.org/spec/#rfc.section.2.1 <member> ::= <tagged-string> <name-separator> <value> <tagged-string> ::= '"' *<char> ':' <tag> '"' Unfortunately I don't have a well-defined grammar for <value>, as my current definitions are somewhat colluded with the ABNF definition of JSON in RFC 7159. I should definitely produce a full grammar! But you can imagine it as being a sort of toplevel symbol. To parse and typecheck TJSON in one pass, it would involve obtaining the parse tree for the LHS of parsing a particular nonterminal and pass it to the pushdown automaton parsing the RHS as a sort of parametric argument along with the remaining unconsumed tokens. At each frame of the stack, the pushdown automaton continues its way towards the terminals, but you unwrap a bit of the parse tree parameter and pass it along with the next pushdown the automaton is consuming, so long as the type signature is for a non-scalar value. When the pushdown automaton has reached the terminals and have almost finished extracting a node on the parse tree, before we return the parsed node we call a small guard/validation function which takes two nodes of the parse tree as arguments, where one is the type signature for the current node, and the other is the parsed value. A tl;dr version: - For a particular nonterminal, I want to have a "parameterized" pushdown automaton that uses LHS to assist parsing RHS, by passing the parse result for LHS to the parser for RHS - I want to add what are effectively "postconditions" to that pushdown automaton which use something approaching boolean algebra to ensure the result is valid This sounds context-sensitive to me, I guess. But even if it is, all it's doing is using type information on LHS to enrich the parsing of RHS. Certainly there's ample precedent for doing that sort of thing in the innumerable statically typed languages out there? If it's context-sensitive, it seems like a very boring kind of context sensitivity. But IANAL (I Am Not A Linguist) These are exactly the kind of cases I think parser combinator libraries are made for. If not, making a second pass to typecheck the parse tree doesn't seem so bad either. There's a completely different approach I'll be using in the Ruby implementation. It's a bit wacky, but I think it works out. -- Tony Arcieri
_______________________________________________ langsec-discuss mailing list langsec-discuss@mail.langsec.org https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss