Re: [julia-users] PEG Parser

John Myles White Sun, 06 Jul 2014 18:09:07 -0700

Thanks for looking into this, Abe. That’s too bad that the CSV parser is much 
slower than the hand-crafted one. PEG seems like a great tool for tasks where 
maximum performance isn’t as important.


 — John

On Jul 4, 2014, at 11:09 AM, Abe Schneider <[email protected]> wrote:

> I got sidetracked by a couple of other things, but the parser is now updated 
> with a bunch of bug fixes. I have a preliminary CSV and graphdot parser (very 
> reduced from the full grammar). I'm starting to put together some more 
> comprehensive tests together.
> 
> As for speed comparison to DataFrames for parsing CSV it's much slower. I've 
> spent time trying to optimize things, but I suspect a large part of the speed 
> issue is the overhead of function calls. Also, I suspect it will be hard to 
> come close to the speed of DataFrames as the code looks like it's fairly 
> optimized for reading just CSV files.
> 
> After a few more tests are written, I'm getting ready to officially call a 
> version 0.1.
> 
> On Thursday, June 5, 2014 6:56:37 AM UTC-4, Abe Schneider wrote:
> I also forgot to push the changes last night.
> 
> On Wednesday, June 4, 2014 11:01:33 PM UTC-4, Abe Schneider wrote:
> After playing around with a bunch of alternatives, I think I've come up with 
> decent action semantics:
> 
> @transform <name> begin
>  <label> = <action>
> end
> 
> For example, a simple graph grammar might look like:
> 
> @grammar nodetest begin
>   start = +node_def
>   node_def = node_label + node_name + lbrace + data + rbrace
>   node_name = string_value + space
> 
>   data = *(line + semicolon)
>   line = string_value + space
>   string_value = r"[_a-zA-Z][_a-zA-Z0-9]*"
> 
>   lbrace = "{" + space
>   rbrace = "}" + space
>   semicolon = ";" + space
>   node_label = "node" + space
>   space = r"[ \t\n]*"
> end
> 
> with it's actions to create some data structure:
> 
> type MyNode
>   name
>   values
> 
>   function MyNode(name, values)
>     new(name, values)
>   end
> end
> 
> 
> with:
> @transform tograph begin
>   # ignore these
>   lbrace = nothing
>   rbrase = nothing
>   semicolon = nothing
>   node_label = nothing
>   space = nothing
> 
>   # special action so we don't have to define every label
>   default = children
> 
>   string_value = node.value
>   value = node.value
>   line = children
>   data = MyNode("", children)
>   node_def = begin
>     local name = children[1]
>     local cnode = children[2]
>     cnode.name = name
>     return cnode
>   end
> end
> 
> and finally, to apply the transform:
> 
> (ast, pos, error) = parse(nodetest, data)
> result = apply(tograph, ast)
> println(result)    # {MyNode("foo",{"a","b"}),MyNode("bar",{"c","d"})}
> 
> The magic in '@transform' basically just creates the dictionary like before, 
> but automatically wraps the expression on the RHS  as an anonymous function  
> (node, children) -> expr.
> 
> I'm currently looking for a better name than 'children', as it's potentially 
> confusing and misleading. It's actually the values of the child nodes (as 
> opposed to node.children). Maybe cvalues?
> 
> On Sunday, May 25, 2014 10:28:45 PM UTC-4, Abe Schneider wrote:
> I wrote a quick PEG Parser for Julia with Packrat capabilities:
> 
> https://github.com/abeschneider/PEGParser
> 
> It's a first draft and needs a ton of work, testing, etc., but if this is of 
> interest to anyone else, here is a quick description.
> 
> Grammars can be defined using most of the standard EBNF syntax. For example, 
> a simple math grammar can be defined as:
> 
> @grammar mathgrammar begin
>   start = expr
>   number = r"([0-9]+)"
>   expr = (term + op1 + expr) | term
>   term = (factor + op2 + term) | factor
>   factor = number | pfactor
>   pfactor = ('(' + expr + ')')
>   op1 = '+' | '-'
>   op2 = '*' | '/'
> end
> 
> 
> To parse a string with the grammar:
> 
> (node, pos, error) = parse(mathgrammar, "5*(2-6)")
> 
> This will create an AST which can then be transformed to a value. Currently 
> this is accomplished by doing:
> 
> math = Dict()
> math["number"] = (node, children) -> float(node.value)
> math["expr"] = (node, children) ->
>     length(children) == 1 ? children : eval(Expr(:call, children[2], 
> children[1], children[3]))
> math["factor"] = (node, children) -> children
> math["pfactor"] = (node, children) -> children[2]
> math["term"] = (node, children) ->
>     length(children) == 1 ? children : eval(Expr(:call, children[2], 
> children[1], children[3]))
> math["op1"] = (node, children) -> symbol(node.value)
> math["op2"] = (node, children) -> symbol(node.value)
> 
> Ideally, I would like to simplify this to using multi-dispatch on symbols 
> (see previous post), but for now this is the easiest way to define actions 
> based on node attributes.
> 
> Finally, to transform the tree:
> 
> result = transform(math, node)  # will give the value of 20
> 
> Originally I was going to attach the transforms to the rules themselves 
> (similiar to boost::spirit). However, there were two reasons for not doing 
> this:
> To implement the packrat part of the parser, I needed to cache the results 
> which meant building an AST anyways
> It's nice to be apply to get different transforms for the same grammar (e.g. 
> you may want to transform the result into HTML, LaTeX, etc.)
> The downside of the separation is that it adds some more complexity to the 
> process.
>

Re: [julia-users] PEG Parser

Reply via email to