On May 19, 2006, at 6:35 PM, Evan Martin wrote:

For a toy project I want to parse the output of a program.  The
program runs on someone else's machine and mails me the results, so I
only have access to the output it generates,

Unfortunately, the output is intended to be human-readable, and this
makes parsing it a bit of a pain.  Here are some sample lines from its
output:

France: Army Marseilles SUPPORT Army Paris -> Burgundy.
Russia: Fleet St Petersburg (south coast) -> Gulf of Bothnia.
England:     4 Supply centers,  3 Units:  Builds   1 unit.
The next phase of 'dip' will be Movement for Fall of 1901.

I've been using Parsec and it's felt rather complicated.  For example,
a "location" is a series of words and possibly parenthesis, except if
the word is SUPPORT.  And that "Supply centers" line ends up being
code filled with stuff lie "char ':'; skipMany space".

I actually have a separate parser that's Javascript with a bunch of
regular expressions and it's far shorter than my Haskell one, which
makes sense as munging this sort of text feels to me more like a
regexp job than a careful parsing job.

I'm considering writing a preprocessing stage in Ruby or Perl that
munges those output lines into something a bit more
"machine-readable", but before I did that I thought I'd ask here if
anyone had any pointers, hints, or better ideas.

Hi Evan,

if the text you want to parse is actually similar to natural language (some posters have suggested that it is much simpler), you may want to have a look at grammar formalisms designed for natural languages. Grammatical Framework (GF) [1] is such a formalism, where the grammars are functional programs. The GF implementation is written in Haskell, and it has an interactive mode and a Haskell API.

[Disclaimer: I participate in the development of GF]

/Björn

[1] http://www.cs.chalmers.se/~aarne/GF/

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to