Hi Klaus, This is quite a coincidence, as I have recently written a CIF parser for the cctbx (Computational Crystallograpy Toolbox) using ANTLR. You can find a C language version of an ANTLR CIF grammar here:
http://cctbx.svn.sourceforge.net/viewvc/cctbx/trunk/iotbx/cif/cif.g?view=markup It is somewhat convoluted with building the CIF model during parsing, but you should be able to strip away that and get a working CIF parser in your chosen target language (it looks like you are wanting Java). Thanks, Richard On 9 July 2010 20:10, Klaus Martinschitz <[email protected]>wrote: > Hi ANTLR Gurus, > > A beginner's question. > I want to write a compiler for Crystallographic Information File Format > ' (CIF). I don't want to explain the syntax in detail only the problem I > have to face with. > > The data starts with a token > > 'data_' > > followed by arbitrary characters and an EOL, e.g. > > data_global > . > > There is also a token > > 'loop_'; > > Somewehere in my BNF I write something like > > DATA > :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_') > ; > > LOOP > : > (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_') > ; > > dataBlockHeading > : (DATA NONBLANCKCHAR+ EOL) > ; > > dataItem > : (tag WHITESPACE value) | (LOOP loopHeader loopBody) > ; > > The first two expressions are tokens the second are rules. My problem is > following. The file starts with > > data_global > > BUT the *lo* of data_g*lo*bal is parsed from the LOOP token. How can > this be if the parser is in the dataBlockHeadingrule? The parser must > know that the characters *lo* belong to NONBLANCKCHAR and not to LOOP, > or? > > I have attached the whole syntax at the end of the file > > Thanks for help > > Regards, > Klaus > > > > > > > > > > > > > grammar CIF1_1; > > options{ > language=Java; > } > > @lexer::header{ > package at.netcrystals.cif_1_1.parser; > } > > @parser::header{ > package at.netcrystals.cif_1_1.parser; > } > > > DATA > :(('d'|'D')('a'|'A')('t'|'T')('a'|'A')'_') > ; > > LOOP > : > (('l'|'L')('o'|'O')('o'|'O')('p'|'P')'_') > ; > > fragment ORDINARYCHAR > : '!' | '%' | '&' | '(' | ')' | '*' | '+' | ',' | '-' | '.' | > '/' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | ':' | > '<' | '=' | '>' | '?' | '@' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | > 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | > 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | '\\' | '^' | '\`' | 'a' | 'b' > | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' > | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' > | '{' | '|' | '}' | '~' > ; > > > NONBLANCKCHAR > : ORDINARYCHAR | '"' | '#' | '$' | '\'' | '_' | ';' | '[' | ']' > ; > > > > WHITESPACE > : '\t'|' ' > ; > > > > /************************************************************************************************ > WhiteSpace and Comments > > ************************************************************************************************/ > > > > > > > EOL > :'\n'|'\r\n' > ; > > > > > > > > /************************************************************************************************ > * > * Root > * > > ************************************************************************************************/ > > cif > : (dataBlock) EOF > ; > > dataBlock > : (dataBlockHeading dataItems) > ; > > dataBlockHeading > : (DATA NONBLANCKCHAR+ EOL) > ; > > > dataItems > : dataItem* EOL > ; > > dataItem > : (tag WHITESPACE value) | (LOOP loopHeader loopBody) > ; > > tag > : NONBLANCKCHAR+ > ; > > > value > : '.' | '?' | charString > ; > > charString > : singleQuotedString > ; > > singleQuotedString > : '\'' NONBLANCKCHAR* '\'' > ; > > loopHeader > : ( (WHITESPACE tag)+) > ; > > loopBody > : value (WHITESPACE value)+ > ; > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
