Thanks for looking into this, Abe. That’s too bad that the CSV parser is much slower than the hand-crafted one. PEG seems like a great tool for tasks where maximum performance isn’t as important.
— John On Jul 4, 2014, at 11:09 AM, Abe Schneider <[email protected]> wrote: > I got sidetracked by a couple of other things, but the parser is now updated > with a bunch of bug fixes. I have a preliminary CSV and graphdot parser (very > reduced from the full grammar). I'm starting to put together some more > comprehensive tests together. > > As for speed comparison to DataFrames for parsing CSV it's much slower. I've > spent time trying to optimize things, but I suspect a large part of the speed > issue is the overhead of function calls. Also, I suspect it will be hard to > come close to the speed of DataFrames as the code looks like it's fairly > optimized for reading just CSV files. > > After a few more tests are written, I'm getting ready to officially call a > version 0.1. > > On Thursday, June 5, 2014 6:56:37 AM UTC-4, Abe Schneider wrote: > I also forgot to push the changes last night. > > On Wednesday, June 4, 2014 11:01:33 PM UTC-4, Abe Schneider wrote: > After playing around with a bunch of alternatives, I think I've come up with > decent action semantics: > > @transform <name> begin > <label> = <action> > end > > For example, a simple graph grammar might look like: > > @grammar nodetest begin > start = +node_def > node_def = node_label + node_name + lbrace + data + rbrace > node_name = string_value + space > > data = *(line + semicolon) > line = string_value + space > string_value = r"[_a-zA-Z][_a-zA-Z0-9]*" > > lbrace = "{" + space > rbrace = "}" + space > semicolon = ";" + space > node_label = "node" + space > space = r"[ \t\n]*" > end > > with it's actions to create some data structure: > > type MyNode > name > values > > function MyNode(name, values) > new(name, values) > end > end > > > with: > @transform tograph begin > # ignore these > lbrace = nothing > rbrase = nothing > semicolon = nothing > node_label = nothing > space = nothing > > # special action so we don't have to define every label > default = children > > string_value = node.value > value = node.value > line = children > data = MyNode("", children) > node_def = begin > local name = children[1] > local cnode = children[2] > cnode.name = name > return cnode > end > end > > and finally, to apply the transform: > > (ast, pos, error) = parse(nodetest, data) > result = apply(tograph, ast) > println(result) # {MyNode("foo",{"a","b"}),MyNode("bar",{"c","d"})} > > The magic in '@transform' basically just creates the dictionary like before, > but automatically wraps the expression on the RHS as an anonymous function > (node, children) -> expr. > > I'm currently looking for a better name than 'children', as it's potentially > confusing and misleading. It's actually the values of the child nodes (as > opposed to node.children). Maybe cvalues? > > On Sunday, May 25, 2014 10:28:45 PM UTC-4, Abe Schneider wrote: > I wrote a quick PEG Parser for Julia with Packrat capabilities: > > https://github.com/abeschneider/PEGParser > > It's a first draft and needs a ton of work, testing, etc., but if this is of > interest to anyone else, here is a quick description. > > Grammars can be defined using most of the standard EBNF syntax. For example, > a simple math grammar can be defined as: > > @grammar mathgrammar begin > start = expr > number = r"([0-9]+)" > expr = (term + op1 + expr) | term > term = (factor + op2 + term) | factor > factor = number | pfactor > pfactor = ('(' + expr + ')') > op1 = '+' | '-' > op2 = '*' | '/' > end > > > To parse a string with the grammar: > > (node, pos, error) = parse(mathgrammar, "5*(2-6)") > > This will create an AST which can then be transformed to a value. Currently > this is accomplished by doing: > > math = Dict() > math["number"] = (node, children) -> float(node.value) > math["expr"] = (node, children) -> > length(children) == 1 ? children : eval(Expr(:call, children[2], > children[1], children[3])) > math["factor"] = (node, children) -> children > math["pfactor"] = (node, children) -> children[2] > math["term"] = (node, children) -> > length(children) == 1 ? children : eval(Expr(:call, children[2], > children[1], children[3])) > math["op1"] = (node, children) -> symbol(node.value) > math["op2"] = (node, children) -> symbol(node.value) > > Ideally, I would like to simplify this to using multi-dispatch on symbols > (see previous post), but for now this is the easiest way to define actions > based on node attributes. > > Finally, to transform the tree: > > result = transform(math, node) # will give the value of 20 > > Originally I was going to attach the transforms to the rules themselves > (similiar to boost::spirit). However, there were two reasons for not doing > this: > To implement the packrat part of the parser, I needed to cache the results > which meant building an AST anyways > It's nice to be apply to get different transforms for the same grammar (e.g. > you may want to transform the result into HTML, LaTeX, etc.) > The downside of the separation is that it adds some more complexity to the > process. >
