I've been working the sister project to JSON Template, "JSON Pattern"
(yes I need some non-bland names).

http://code.google.com/p/json-pattern/

JSON Template takes a dictionary -> string, and this takes a string ->
dictionary.  You can describe it roughly as annotating a big regular
expression with a (JSON) tree structure.

There are a bunch of things that need to be polished -- the main issue
is making the syntax as readable as possible.  I've gone through about
3 iterations there, with some more tweaks to make.  Suggestions
welcome.
 

Simple example of parsing "ls -al" (subpattern definitions omitted
here, the next 2 links have them):

http://chubot.org/json-pattern/test-cases/testLs_NewSyntax.html

Mini tutorial that explains the parts:

http://chubot.org/json-pattern/test-cases/testMiniTutorial.html

Parsing a big Perforce change description (from Google's open source
work; scroll to the end for the big pattern, and a nice hierarchical
structure):

http://chubot.org/json-pattern/test-cases/testFullChangeDesc_NewSyntax.html


Summary

* Like JSON Template, it's meant to be a language-independent specification
   * Can be built on top of any regex engine, particularly
JavaScript's relatively weak one
   * API is data, rather than a procedural API
   * ~1000 lines of code, so it can be ported easily, but still powerful
   * A well-defined (and fast) execution model

* Readable syntax (still improving here).  Regular expressions are
very powerful, but hobbled by their obscure and inconsistent syntax.

* A small number of orthogonal concepts
   * Blocks (e.g. for expressing repeated capture)
   * Filters (extensible through host language)
   * Subpatterns (a pattern reuse mechanism)

* Composes with other components
   * The interpreter implements a binary operator (I think of it like ~=)
   * You can easily imagine a pipeline of text -> JSON Pattern ->
structured data manipulation -> JSON Template -> text


What does this add over regular expressions?

* The ability to capture named, hiearchical data structures.  Regular
expressions can only capture flat data, and in some engines like
JavaScript, the data can't be named.
* Can capture integers and booleans, not just strings, via filters.
* Reuse of regular expressions.  This is fairly common in practice,
e.g. when writing ad hoc lexers.
* More readable syntax, using line prefixes.


Applications

* Exposing system stats from command line tools over the network, e.g.
web services for system administration
* Quick and dirty parsing of some network formats, like DNS, HTTP headers, etc.
* Parsing little languages like *itself* and JSON Template.  This
should be possible, since there are no operators with precedence and
such.


Caveats

* In most cases you wouldn't use this for HTML scraping.  For HTML
scraping, you want something that knows about the tree structure of
the document, like jQuery's selector language.


TODO

* Allow filters to stop the match by returning None
* Subpatterns can also be filters, for structure refinement!  (both
are functions from string -> JSON).  Like the "Templates as
Formatters" idea, this language turned out to be unexpectedly rich.
* Perhaps allow hooks for executing procedural code, not just filters
(Perl does this in a messy way).
* Embedding a library of patterns in "JSON Config"

* Need lots of docs!
* Code cleanup, test cleanup

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "JSON 
Template" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/json-template?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to