(1/2 - I believe these messages didn't went to the list. Sorry if they actually did.)
I've always been a fan of using the unix text input and output to connect simple tools to achieve complex results, but I think there's a missing piece in the tool set: parsers.
There are *many* ways you could begin solving the parser void. However, given that there are so many choices, it would help a lot if you had a specific problem in mind.
I need help selecting that problem. If I could write something that could be used to digest input to a prototype natural language parser and analyser, that could be a goal, but I don't know how feasible that is. If it helps narrow the circle, I think a parser that can handle complex input would be better than a fast parser. If one had a complicated input to which a structure is unknown, I would like my tool to be a nice help in testing structures to see if they fit and where they don't, maybe with given amount of processing limit.
Lex filters using regexes which is a subset of the patterns yacc can process. If regexes are all that you need then take a close look at awk.
I learned about awk already. I would like to make something that complements it. Maybe a tool to change complex input into something we could deal with in awk.
For more generic parsing than regex it may be a good idea to look into parser combinators.
I know this 'parsec' library from Haskell: http://www.haskell.org/haskellwiki/Parsec But it seemed to me that when dealing with complicated problems the power comes a lot from Haskell itself. I would like to write something one could use only by knowing how to use shell tools. I did find a lot on wikipedia, like a Chomsky classification of grammars and LL x LR parsers, and alternatives of notation for grammars. But I could not grasp what could I use, I have no experience with that to do any informed choice. One way to go would be to choose some paper that exposes some cool idea on parsing and implement what it describes. With a well chosen paper this could result in a tool with clear applicability and limits. Thanks Maurício