On Monday, 17 December 2012 at 04:49:46 UTC, Michel Fortin wrote:
On 2012-12-17 03:18:45 +0000, Walter Bright
<[email protected]> said:
Whether the file format is text or binary does not make any
fundamental difference.
I too expect the difference in performance to be negligible in
binary form if you maintain the same structure. But if you're
translating it to another format you can improve the structure
to make it faster.
If the file had a table of contents (TOC) of publicly visible
symbols right at the start, you could read that table of
content alone to fill symbol tables while lazy-loading symbol
definitions from the file only when needed.
Often, most of the file beyond the TOC wouldn't be needed at
all. Having to parse and construct the syntax tree for the
whole file incurs many memory allocations in the compiler,
which you could avoid if the file was structured for
lazy-loading. With a TOC you have very little to read from disk
and very little to allocate in memory and that'll make
compilation faster.
More importantly, if you use only fully-qualified symbol names
in the translated form, then you'll be able to load lazily
privately imported modules because they'll only be needed when
you need the actual definition of a symbol. (Template
instantiation might require loading privately imported modules
too.)
And then you could structure it so a whole library could fit in
one file, putting all the TOCs at the start of the same file so
it loads from disk in a single read operation (or a couple of
*sequential* reads).
I'm not sure of the speedup all this would provide, but I'd
hazard a guess that it wouldn't be so negligible when compiling
a large project incrementally.
Implementing any of this in the current front end would be a
*lot* of work however.
Precisely. That is the correct solution and is also how [turbo?]
pascal units (==libs) where implemented *decades ago*.
I'd like to also emphasize the importance of using a *single*
encapsulated file. This prevents synchronization hazards that D
inherited from the broken c/c++ model.