As there are no further posts right now, I would like to take the
opportunity for a personal conclusion (I admit: It got a longer one ;-).

ASN.1

I took a look at ASN.1 -- This look was really quick, so I might be
wrong on that. ASN.1 experts are welcome to correct me in this case.
I got the feeling that in ASN.1 syntax and encoding are strongly
coupled. I.e. ASN.1 is human readable notation, but you have to take the
encodings provided. This is quite fine for protocols were you're
normally only interested that the encoding is good (compact etc.), but
not how it works in detail, because this is done automatically by
generated code.

That said ASN.1 is not feasible, in my eyes, if you have an already
defined file format and want to generate a parser out of such a ASN.1
grammar.


ANTLR and binary formats

I still think that it would be great if ANTLR would be enhanced to be
able to also parse binary formats. In my eyes it's the right place and would 
make ANTLR even more unique.
Making ANTLR fit for binary formats would involve following changes:
 1. Enhance capabilities of input handling
 2. Enhance ANTLR grammar
 3. Enhance code generator of ANTLR

For 1.: In the end effect ANTLR does already binary file format handling. In 
that moment ANTLR reads files in one of the four Unicode encodings (UTF-8, 
UTF-16 LE, UTF-16 BE, UTF-32) including Byte Order Mark and surrogates support, 
it lexes a binary format.
Because I don't know ANTLR in detail, I guess here the Sun/Oracle code is used 
which does this. So ANTLR does this not explicitly, but by usage of the 
official class libraries. I think here would be some work to be done, but if 
the Java class libraries are not flexible enough, I'm quite sure that ICU4J 
will be.

2. and 3. are quite clear: The current ANTLR grammar has currently no support 
for binary formats, so an extension of some sort would be needed and of course 
the code generator of ANTLR must also support this.


The last question to discuss is: Is it possible to describe binary formats in a 
grammar?

I say: Yes, for most of them, this will work. For those it will not work fully, 
a grammar would at least ease life (you would end up doing the rest using 
actions etc.). 

In a former post Ron Burk said:

"Binary file formats also often just aren't directly representable by context 
free grammars. For example, a header may contain offsets of different objects, 
and the sizes of those objects may have to be inferred from the difference in 
offsets. Grammars, despite looking seductively similar because of having 
recursively nested constructs in common, aren't a great match for this domain.

One could imagine useful domain-specific languages for binary file formats, but 
they might not look quite like grammar tools, and a single language might not 
be sufficient for all tasks."

I agree and disagree. No matter if they are context free or not: They can be 
parsed. Binary formats have the benefit, that they were designed to be _machine 
readable_, and not, like programming languages, _human readable_. In general 
this makes them easier parsable.

Instead of designing domain specific languages, I would prefer an integration 
into ANTLR, because there are also file formats out in the wild which combine 
binary data with text data -- and both needs to be parsed. Having two separate 
programs is not elegant -- you would end up with a high effort to put binary 
and text parsing results in one abstract syntax tree.

In my opinion there are typical design patterns often used in binary formats. 
Offsets as mentioned by Ron in the former post are an example, as well as what 
I wrote in my first post, section "Interpretation of size":

---------------------------------------------
| header | size of next block | block | ... |
---------------------------------------------

Such patterns could be represented in an expressive syntax.

I think the big issue, which makes binary files different from text files, is 
their self-referential nature: To be able to read a binary file you have to 
partially interpret it and use this information to manage the read process. You 
mostly can't decouple parsing and interpretation. But this is in my opinion no 
reason to not add such a functionality to ANTLR.

Andi
-- 
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to