well, in writing this I will at least try to organize my thoughts, and speak
more of "general ideas" than "implementation details" (granted, I am far
more inclined towards the latter than the former).
I will allow that probably many of those here will strongly disagree with
all I have to say, but this is acceptable to me.
but, yes, I have before made this observation about the design of many
things, and this is especially true of many programming languages and VMs:
the design is usually very centralized, and often to the exclusion of
"everything else".
The Issue of Languages
so, a person will design a language, and they will apparently think to
themselves, "well, ok, since I am designing this language, and so I will
throw away all of these 'arcane' details common in other languages".
now, of course, no one can really agree what they want to throw away first.
some will throw away the syntax ("don't need any of that C style crap"),
others will throw away assignment, and yet others will throw away the
ability to do anything with said language.
so, they seek out some clean and elegant design, and produce something that
usually does not expand too far outside of some niche.
now, granted, this is not always the case, for example, many people adopted
Java and the JVM, even though it has many of the traits I will be objecting
too, however I will also note that modern Java and the modern JVM has
essentially gone in a direction away from all this, even though many of
these aspects have been retained in the core design.
it can also be noted that, as far as Java goes, it was one of the more
conservative of most of these languages (it had a relatively conventional
syntax and featureset, ...), and the VM had an overall fairly practical
design (AKA: has a much lower performance and memory overhead than many
VMs).
Syntax and Generalities
so, in terms of language design, I will advocate the goal of being as
conservative as is reasonable, and also appealing to the widest community as
is reasonable.
so, for a end-user language, this almost demands an imperative OO language
with C-style syntax (AKA: a typical user of C/C++/Java/C#/JS/... if
presented with some code, should not be left asking themselves "ok, now just
what the hell am I looking at?...").
and should they bother to learn to use the language, should not end up
"walking on broken glass", as they discover bizarre behaviors and
restrictions which the designer had found "elegant".
now, of course, this does not mean I oppose adding "new" functionality to a
language, only that it should be done in a way hopefully causing minimal
impact to more established conventions and practices.
so, for example, a C-family language can safely add many Functional
features, such as closures, tail-call optimization, tail-expression return
values, ... provided they don't screw up ones' ability to do "ordinary"
things.
I will use as an example, consider if it was acceptable to write:
function fib(x) if(x>2)fib(x-1)+fib(x-2) else 1;
but, at the same time, one could also write:
function fib(x)
{
if(x<=2)return 1;
return fib(x-1)+fib(x-2);
}
now, this does not mean every language has to use C style syntax, but the
big question is one of what the language is to do. for example, most people
will neither notice or care if the compiler IR or some specialized
micro-language has some obscure syntax and semantics filled with lots of
sharp edges, only that one in this case can't expect the "community at
large" to accept this language.
ok, and it is also worth noting that as complicated as the syntax of most
C-family languages can be to parse, by far this is actually one of the
simplest parts of the whole process.
granted, yes, a trivial dynamically-typed bytecode interpreter is fairly
easy to write vs a C-style parser, but for a statically-typed compiler and
native-code output, the process is by far more complicated.
consider, one writes say, like, 5 kloc for the parser, but then ends up
writing another 20 or 30 for the lower compiler stages and native
codegen...
Dynamics and the Type-System
now, one can argue, "well then, why not just use dynamic types and bytecode
interpreters?..."
however, this misses that there are some rather notable good points to
static typing, which are usually considered to outweigh the costs of
implementation:
static verification, more room for optimization, ...
now, this is not meant to say that I think everyone should give up dynamic
languages in favor of good old staticaly-typed batch-compiled languages,
only that the advantages of static typing (even if used in the contex of
optional annotations for an otherwise dynamic scripting language), will tend
to outweigh the overt complexity added to the implementation.
I guess the point to make here is that many of us who "seriously" write
code, are more concerned with things working well than about minimizing the
total lines of code that need to be written or the theoretical complexity of
the entire system.
probably not like any of us probably like wasting bunches of extra effort on
things that wont pay off, but it is usually more useful IME to have an
effective component, rather than a simple or elegant component.
and, often, we like it more when our compilers will catch and return some
obvious and stupid error (such as messing up the number of arguments,
accidentally passing the wrong value, ...) than having to wait until some
later time when such errors might cause periodic runtime typecheck failures
("BARF: foo/bar.scr:69: typecheck failed, method draw/2 not understood for
type CONS").
this is especially true considering that many times, any exceptions are not
caught, and usually the VM kills the entire app.
with full static verification, even if many parts of the app are dynamic,
many possible bugs and potentially serious errors can be detected and
reported up front than if these sorts of checks are not done, in the long
run saving the programmer time, and greatly improving the apps' quality and
end-user experience (hint: messages like the above tend to be a near
constant annoyance to users of apps written in a certain popular language,
many of which could likely have been avoided had the compiler been able to
do many obvious checks).
this does not mean, however, that I feel dynamic types are useless, rather,
there are many cases where this kind of flexibility can be damn useful, only
that, in general, I would far rather see wide use of dynamic capabilities in
an otherwise static language, than the use of type-inference in a dynamic
language (type inference, though often doing a good job at optimizing, does
a poor job at catching programmer errors).
more so, if such a language does offer dynamic types, often the compiler
proves more capable of figuring out what the types are (since they are
usually much more firmly anchored to established static types), and so can
more capably report on probable violations of type safety.
this is in part due to the fact that, most of the time, what is really
needed is static typing, and dynamic typing is needed in a sufficiently
minor number of cases that I don't really believe it is justified as the
primary type model of a language.
yes, many of us really are apes on the keyboard, typing periodic and
obviously stupid crap that we needs the compiler's help to notice.
so, C, C++, Java, and C# all do a reasonably good job here IMO.
Prototypes and Class/Instance
by a similar token, I will extend this argument to Class/Instance and
Prototype OO, where even if Class/Instance is sometimes awkward and can't do
some things Prototype OO does easily, both the performance and verifiability
make C/I, in general, the more appropriate option.
granted, yes, the capabilities of prototypes are not something to be
discounted, but personally I would rather see these exist in terms of the
good old statically-typed class/instance model, than have to forsake both
for these capabilities.
as a simple example, in the object system under development for my VM, C/I
is the primary model, however, both the dynamic addition of slots, and also
object delegation, are being made available as optional features (granted
that, yes, the way I am implementing things is a little weird in some
cases).
I could attempt to more elaborate on the design and semantics of my hybrid
model if anyone cares.
Language World
a second area of concern, and also something I feel is actually a bigger
issue than that of the exact design of the language itself, is how easily it
integrates with the outside world.
it is very common in my experience for languages and VMs to be designed with
the assumption that the "entire world", as far as the developer is
concerned, is located inside the language and VM.
as a result, many of these languages are designed such that they don't
actually import or interface with the outside world in an at all direct
manner.
instead, it is the idea that everything to be accessed from the language/VM
either has to be written in the language in question, or be systematically
wrapped in order to be visible to the VM.
another consequence is that mixed-language codebases are usually exceedingly
awkward (very often requiring large amounts of horrible looking and brittle
boilerplate), and many such VMs are, by definition, often rather limited in
terms of the available capabilities (notable by just how often and
dramatically many new VM efforts pronounce the new-gained ability to access
OpenGL or GTK, and very often with an an interface which is horribly
strained or mutilated due to the languages' inabilities to faithfully match
the way the outside "C world" does things...).
the result is that many such VMs become a kind of "ivory tower",
representing its own entire world and landscape, both structurally and
syntactically isolated from the outside world, and where the choice of
language also becomes one of associated code-world.
much effort is wasted, as each major VM/language implementor feels the need
to rebuild the world from scratch in their own "new and innovative"
language.
often, in an extreme form of this idea, the developers feel compelled to
make an implementation of the VM and compiler itself from within this
language, most often with this amounting to little more than an academic
curiousity (the version of the VM that everyone continues using typically
having been written in C or C++).
of all the major VMs, nearly every one I am aware of, has these problems to
a greater or lesser extent.
yes, all this is typically inherent with implementing a VM, but I personally
believe that this is one of those things that CAN and SHOULD be battled
within the implementation of a VM framework.
we don't need to make a few of the VM's capabilities available by writing
the VM in itself, rather, I will assert, it is better to make the VM
sufficiently capable with interfacing with the outside world, that it is
also capable of interfacing with its own implementation (even if it so
happens that the whole thing was written in, GASP, C or C++...).
from what I can tell, .NET is one of the few VMs which has addressed these
issues in a half-decent manner (though, IMO, .NET is still a long ways from
perfect in these regards...).
My Effort
this is actually one of the core areas I battle with in my effort, where I
am not so much trying to make a new world on top of the old one, so much as
I am trying to leverage the existing world as much as possible, while still
pushing forwards with high-level abstractions and dynamically capabilities.
this is also a major reason why I had chosen as my main language, first and
foremost, to implement a subset/superset of C99 (a few features were left
out for various reasons, but the language is sufficiently implemented that
most C code should work without problems).
I also added a few compiler extensions for features which seemed like they
could be rather nifty to have, but in general (due to concern over
maintaining freedom to compile code with either my compiler or gcc), I have
not made so much use of these features as I would have thought. since then,
more effort has been put into APIs (with a possible optimization route via
intrinsics), than into syntactic extensions (note that these extensions have
usually been added within the conformance guidelines set forth in the
standards).
as-is, the inner-world of dynamically-compiled C interfaces relatively
seamlessly with the outside world, and so, FWIW, the VM is written in its
own language.
as it also so happens, most of the languages' runtime features are also
readily accessible from good old statically compiled and linked code.
the main limitation of C, however, is with C itself:
compiling C code tends not to be neither the fastest nor most conservative
with memory, due in large part to the rather large amount of stuff typically
included from various headers.
object caching is a partial fix to this problem, but does not address the
whole range of issues, and it would be nice to be able to compile scripts
without causing a notable delay and typically the need to run the GC one or
more times...
likewise, precompiled headers are unlikely to be an adequate solution, since
I am unaware of any real way to do this effectively apart from potentially
violating C's semantics (transforming the header-inclusion system more into
a kind of module-import system) and also requiring notable changes to how I
approach compiling C code (making the middle and lower stages of the
compiler aware of such a module system).
eventually though, such changes may be useful and necessary (especially if I
were to support such notions as having multiple runnable C-based
"applications" on top of the same framework, say with each having its own
partially-independent scope and world state, nevermind any obvious
multiplexing issues and likely introducing many of the same kinds of
problems that plauged Windows 9x...).
so, for these reasons, both Java and JavaScript have been considered as
languages worth implementing, with the desired goal of being able to
implement both while keeping the "wall of impedence" as small as possible
(even granted that C, Java, and JS naturally operate in very different
worlds), and also I would like to do all this absent having to write lots of
ugly biolerplate (JNI or otherwise) to make everything play well together.
recently, this issue has been one of the greater areas of concern, and has
led to the development of an object system which I am trying to make readily
useful to all 3 languages, in C via a clean and hopefully not-too-awkward
API, and in the others' by handling the native semantics of the languages,
and as well trying to address the issue of making code and data in 1 at
least presumably accessible from the other, without too much additional
work.
granted, I have implemented JNI for the sake that it would be a good deal
useful for trying to import existing an classpath library (either GNU
classpath or the one from Apache, but I have yet to decide conclusively at
this point).
initially in my effort I have focused exclusively on generating native code
(actually, this is backwards from most VMs, which generate bytecode first
and only later fallback to native code for sake of optimization), and
although native code offers some fairly strong incentives, it is by no means
free of issues. namely, that it is generally expensive to produce (at least
from C source...), and is (in my implementation) impervious to garbage
collection (garbage collecting the C toplevel and executable code is a
notably more hairy issue than that of the heap).
so, in my implementation, with a few rare exceptions (specially crafted
executable thunks located within heap-based memory) dynamically compiled or
linked-in code (such as static libraries) will not be garbage collected...
so, for these reasons I had decided (actually a few months ago now, but I
don't have much time for coding as of late, so it is slow) to implement a
bytecoded VM.
I had decided to make use of Java-ByteCode for the VM for various reasons
(namely, that there are existing compilers that target it, and it is a
fairly "de-facto" bytecode format), but of course this means implementing
much of a JVM.
note that I decided actually to write my own JVM, rather than re-using an
existing one, in the hope of creating yet another JVM which would create its
own "Java Island...", and also because I might want to target other things
to it (for example, C).
so, the JVM is actually more of a template for a VM, than the entire design
(my plans are actually to support a modified version of the bytecode, which
add many features I feel necessary to "adequately" support languages such as
JS and C).
granted, yes, some C compilers do target the JVM, but they do the even less
useful thing of creating a new C island inside the JVM, which is hardly
useful, where I would want bytecoded scripts to be able to freely access
C-land, which requires more than a few extensions to be practical.
granted... yes... the JVM classpath does have a few classes which have the
needed functionality, but to implement C in such a way would lead to
pitifully horrible performance.
my goal is, however, to retain backwards compatibility with more
conventional versions of the JVM, if anything so that hopefully I can make
use of compilers like Eclipse and not have to needlessly write my own Java
compiler as well (granted... I am not certain at this point if Eclipse can
be used as a dynamic script compiler, having not looked much into it, but it
would make sense given the existing JVM supports this capability...).
it also remains as a possibility to include features for running AVM2
bytecode as well (either via recompiling the bytecode or via separate
loaders and a separate interpreter loop).
note that the JVM proper (and also likely for AVM2 support) would be a small
portion of the total project complexity (since, for the most part,
everything is being based on already existing functionality from within the
framework, and what parts I am implementing for the JVM should also apply
without too much effort to the AVM, me having noticed more than a few likely
exploitable similarities between the VM's...).
all this is not likely to be too much of a problem, since, after all, unlike
most VM frameworks mine is highly decentralized (most of the components are
as different libraries, and many have little or no direct interaction with
the others and are designed to be replacable with another component so long
as it serves a similar role/produces similar output/accepts similar
input/...).
there are actually several major components within all this that communicate
primarily via opaque representations (such as a specialized textual
representation or via XML).
or such...
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc