Hello Stephan, >*************Original message************* > From: Stephan Michels <[EMAIL PROTECTED]> > To: Stephan Michels <[EMAIL PROTECTED]> > Date: Wednesday, February 13, 2002, 5:18:34 PM > Subject: text parser
> On Wed, 13 Feb 2002, Andrew Answer wrote: >> Hello Stephan, >> >> is a good idea! Now i converting many text documents to XML by using >> PHP scripts offline... >> Some names for your parser: txt2xml (simply and clear), > There exists already a project this this name: > http://xml.gsfc.nasa.gov/ingest_demo/txt2XML.html >> JTF (Java Text Formatter), > Look an JTF.org:Jewish Task Force ;-) >> JTC (Java Text Converter). > http://www.jtc.com/ is also given > Finding a name isn't so easy as I think. :( Hmmm.... why you reject project name if domain-name are reserved? Number of projects more than number of domains :) >> Also look at the APTConvert >> (http://www.xmlmind.com/aptconvert/distrib/docs/userguidetoc.html), >> may be this tool can help you. > I think my project could help you. > A example grammar looks like: > <grammar> > <tokens> > <token tsymbol="id"> > <concat> > <cc><ci min="A" max="Z"/><ci min="a" max="z"/></cc> > <cc minOccurs="0" maxOccurs="*"> > <ci min="A" max="Z"/><ci min="a" max="z"/><ci min="0" max="9"/> > <cs content="_"/> > </cc> > </concat> > </token> > <token tsymbol="mult" assoc="right"> > <string content="*"/> > </token> > <token tsymbol="plus" assoc="left"> > <string content="+"/> > </token> > <token tsymbol="dopen"> > <string content="("/> > </token> > <token tsymbol="dclose"> > <string content=")"/> > </token> > </tokens> > <whitespace> > <cc maxOccurs="*"><cs content=" 	 "/></cc> > </whitespace> > <productions> > <production ntsymbol="E"> > <ntsymbol name="E"/><tsymbol name="plus"/><ntsymbol name="E"/> > </production> > <production ntsymbol="E"> > <ntsymbol name="E"/><tsymbol name="mult"/><ntsymbol name="E"/> > </production> > <production ntsymbol="E"> > <tsymbol name="dopen"/><ntsymbol name="E"/><tsymbol name="dclose"/> > </production> > <production ntsymbol="E"> > <tsymbol name="id"/> > </production> > </productions> > <ssymbol ntsymbol="E"/> > </grammar> > This grammar converts the string "A*b+c*D+(e+F)*G" to > <E> > <E> > <E> > <E> > <id>A</id> > </E> > <mult>*</mult> > <E> > <id>b</id> > </E> > </E> > <plus>+</plus> > <E> > <E> > <id>c</id> > </E> > <mult>*</mult> > <E> > <id>D</id> > </E> > </E> > </E> > <plus>+</plus> > <E> > <E> > <dopen>(</dopen> > <E> > <E> > <id>e</id> > </E> > <plus>+</plus> > <E> > <id>F</id> > </E> > </E> > <dclose>)</dclose> > </E> > <mult>*</mult> > <E> > <id>G</id> > </E> > </E> > </E> Well-driven engine! It's look like XML parser... Suggestions: I'm worked with byacc/flex, but already forget his syntax. May be better to make DTD of your grammar more readable? Then, you can even write stylesheet for converting byacc grammar into your grammar. And use it with your parser - it's a good test. How about whitespaces? Unlike XML, text files need to recognize one or two CR/LF and apply different rules, etc... May be you can to produce one text from another (line formatting, adjusting, lists formatting, etc)? And later you can transmute it into Generator/Transformer (but you must produce SAX stream for right work, i think)... Happy hacking! Best regards, Andrew Answer [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]