ANTLR is overkill for binary file formats:  I know of no binary file format 
that requires more than one (variable length) item of lookahead for processing, 
nor would I expect to find one--binary formats are intentionally designed and 
evolved.

It is fairly simple to design a language for dealing with binary file formats 
and to support item (byte, various length integer, IEEE float and double 
numbers, etc) encode/decode logic for individual fields and thence to provide 
one or more backends for processing files.  ASN.1 is an extreme example of 
this; when I implemented such a language, the grammar only took up two pages or 
so.  For my language, backends included generation of C struct definitions, 
file reader/writer generation, and some others that I have forgotten.  ANTLR 
makes it easy to design, implement, and extend such DSLs, but you do not need 
the ANTLR machinery for processing the files.

--Loring



>________________________________
>From: andreas kleiber <[email protected]>
>To: [email protected]
>Sent: Friday, September 23, 2011 9:57 AM
>Subject: Re: [antlr-interest] Binary support
>
>As there are no further posts right now, I would like to take the
>opportunity for a personal conclusion (I admit: It got a longer one ;-).
>
>ASN.1
>
>I took a look at ASN.1 -- This look was really quick, so I might be
>wrong on that. ASN.1 experts are welcome to correct me in this case.
>I got the feeling that in ASN.1 syntax and encoding are strongly
>coupled. I.e. ASN.1 is human readable notation, but you have to take the
>encodings provided. This is quite fine for protocols were you're
>normally only interested that the encoding is good (compact etc.), but
>not how it works in detail, because this is done automatically by
>generated code.
>
>That said ASN.1 is not feasible, in my eyes, if you have an already
>defined file format and want to generate a parser out of such a ASN.1
>grammar.
>
>
>ANTLR and binary formats
>
>I still think that it would be great if ANTLR would be enhanced to be
>able to also parse binary formats. In my eyes it's the right place and would 
>make ANTLR even more unique.
>Making ANTLR fit for binary formats would involve following changes:
>1. Enhance capabilities of input handling
>2. Enhance ANTLR grammar
>3. Enhance code generator of ANTLR
>
>For 1.: In the end effect ANTLR does already binary file format handling. In 
>that moment ANTLR reads files in one of the four Unicode encodings (UTF-8, 
>UTF-16 LE, UTF-16 BE, UTF-32) including Byte Order Mark and surrogates 
>support, it lexes a binary format.
>Because I don't know ANTLR in detail, I guess here the Sun/Oracle code is used 
>which does this. So ANTLR does this not explicitly, but by usage of the 
>official class libraries. I think here would be some work to be done, but if 
>the Java class libraries are not flexible enough, I'm quite sure that ICU4J 
>will be.
>
>2. and 3. are quite clear: The current ANTLR grammar has currently no support 
>for binary formats, so an extension of some sort would be needed and of course 
>the code generator of ANTLR must also support this.
>
>
>The last question to discuss is: Is it possible to describe binary formats in 
>a grammar?
>
>I say: Yes, for most of them, this will work. For those it will not work 
>fully, a grammar would at least ease life (you would end up doing the rest 
>using actions etc.). 
>
>In a former post Ron Burk said:
>
>"Binary file formats also often just aren't directly representable by context 
>free grammars. For example, a header may contain offsets of different objects, 
>and the sizes of those objects may have to be inferred from the difference in 
>offsets. Grammars, despite looking seductively similar because of having 
>recursively nested constructs in common, aren't a great match for this domain.
>
>One could imagine useful domain-specific languages for binary file formats, 
>but they might not look quite like grammar tools, and a single language might 
>not be sufficient for all tasks."
>
>I agree and disagree. No matter if they are context free or not: They can be 
>parsed. Binary formats have the benefit, that they were designed to be 
>_machine readable_, and not, like programming languages, _human readable_. In 
>general this makes them easier parsable.
>
>Instead of designing domain specific languages, I would prefer an integration 
>into ANTLR, because there are also file formats out in the wild which combine 
>binary data with text data -- and both needs to be parsed. Having two separate 
>programs is not elegant -- you would end up with a high effort to put binary 
>and text parsing results in one abstract syntax tree.
>
>In my opinion there are typical design patterns often used in binary formats. 
>Offsets as mentioned by Ron in the former post are an example, as well as what 
>I wrote in my first post, section "Interpretation of size":
>
>---------------------------------------------
>| header | size of next block | block | ... |
>---------------------------------------------
>
>Such patterns could be represented in an expressive syntax.
>
>I think the big issue, which makes binary files different from text files, is 
>their self-referential nature: To be able to read a binary file you have to 
>partially interpret it and use this information to manage the read process. 
>You mostly can't decouple parsing and interpretation. But this is in my 
>opinion no reason to not add such a functionality to ANTLR.
>
>Andi
>-- 
>Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
>belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de
>
>List: http://www.antlr.org/mailman/listinfo/antlr-interest
>Unsubscribe: 
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to