ANTLR is overkill for binary file formats: I know of no binary file format that requires more than one (variable length) item of lookahead for processing, nor would I expect to find one--binary formats are intentionally designed and evolved.
It is fairly simple to design a language for dealing with binary file formats and to support item (byte, various length integer, IEEE float and double numbers, etc) encode/decode logic for individual fields and thence to provide one or more backends for processing files. ASN.1 is an extreme example of this; when I implemented such a language, the grammar only took up two pages or so. For my language, backends included generation of C struct definitions, file reader/writer generation, and some others that I have forgotten. ANTLR makes it easy to design, implement, and extend such DSLs, but you do not need the ANTLR machinery for processing the files. --Loring >________________________________ >From: andreas kleiber <[email protected]> >To: [email protected] >Sent: Friday, September 23, 2011 9:57 AM >Subject: Re: [antlr-interest] Binary support > >As there are no further posts right now, I would like to take the >opportunity for a personal conclusion (I admit: It got a longer one ;-). > >ASN.1 > >I took a look at ASN.1 -- This look was really quick, so I might be >wrong on that. ASN.1 experts are welcome to correct me in this case. >I got the feeling that in ASN.1 syntax and encoding are strongly >coupled. I.e. ASN.1 is human readable notation, but you have to take the >encodings provided. This is quite fine for protocols were you're >normally only interested that the encoding is good (compact etc.), but >not how it works in detail, because this is done automatically by >generated code. > >That said ASN.1 is not feasible, in my eyes, if you have an already >defined file format and want to generate a parser out of such a ASN.1 >grammar. > > >ANTLR and binary formats > >I still think that it would be great if ANTLR would be enhanced to be >able to also parse binary formats. In my eyes it's the right place and would >make ANTLR even more unique. >Making ANTLR fit for binary formats would involve following changes: >1. Enhance capabilities of input handling >2. Enhance ANTLR grammar >3. Enhance code generator of ANTLR > >For 1.: In the end effect ANTLR does already binary file format handling. In >that moment ANTLR reads files in one of the four Unicode encodings (UTF-8, >UTF-16 LE, UTF-16 BE, UTF-32) including Byte Order Mark and surrogates >support, it lexes a binary format. >Because I don't know ANTLR in detail, I guess here the Sun/Oracle code is used >which does this. So ANTLR does this not explicitly, but by usage of the >official class libraries. I think here would be some work to be done, but if >the Java class libraries are not flexible enough, I'm quite sure that ICU4J >will be. > >2. and 3. are quite clear: The current ANTLR grammar has currently no support >for binary formats, so an extension of some sort would be needed and of course >the code generator of ANTLR must also support this. > > >The last question to discuss is: Is it possible to describe binary formats in >a grammar? > >I say: Yes, for most of them, this will work. For those it will not work >fully, a grammar would at least ease life (you would end up doing the rest >using actions etc.). > >In a former post Ron Burk said: > >"Binary file formats also often just aren't directly representable by context >free grammars. For example, a header may contain offsets of different objects, >and the sizes of those objects may have to be inferred from the difference in >offsets. Grammars, despite looking seductively similar because of having >recursively nested constructs in common, aren't a great match for this domain. > >One could imagine useful domain-specific languages for binary file formats, >but they might not look quite like grammar tools, and a single language might >not be sufficient for all tasks." > >I agree and disagree. No matter if they are context free or not: They can be >parsed. Binary formats have the benefit, that they were designed to be >_machine readable_, and not, like programming languages, _human readable_. In >general this makes them easier parsable. > >Instead of designing domain specific languages, I would prefer an integration >into ANTLR, because there are also file formats out in the wild which combine >binary data with text data -- and both needs to be parsed. Having two separate >programs is not elegant -- you would end up with a high effort to put binary >and text parsing results in one abstract syntax tree. > >In my opinion there are typical design patterns often used in binary formats. >Offsets as mentioned by Ron in the former post are an example, as well as what >I wrote in my first post, section "Interpretation of size": > >--------------------------------------------- >| header | size of next block | block | ... | >--------------------------------------------- > >Such patterns could be represented in an expressive syntax. > >I think the big issue, which makes binary files different from text files, is >their self-referential nature: To be able to read a binary file you have to >partially interpret it and use this information to manage the read process. >You mostly can't decouple parsing and interpretation. But this is in my >opinion no reason to not add such a functionality to ANTLR. > >Andi >-- >Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir >belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de > >List: http://www.antlr.org/mailman/listinfo/antlr-interest >Unsubscribe: >http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
