Re: [fonc] Why Bytecode is a Bad Idea for Distribution

BGB Tue, 26 Jul 2011 23:19:51 -0700

On 7/26/2011 8:34 PM, David Barbour wrote:

On Tue, Jul 26, 2011 at 3:28 PM, BGB <cr88...@gmail.com<mailto:cr88...@gmail.com>> wrote:
    why do we need an HLL distribution language, rather than, say, a
    low-level distribution language, such as bytecode or a VM-level
    ASM-like format, or something resembling Forth or PostScript?...


Because:
(1) Code will often adapt to relatively 'static' conditions indicatedby the host (such as user-agent, screen size, or access to a 'tilt'sensor). In these cases, higher-level code is often much easier tospecialize and garbage-collect than an opaque block of bytecode.


one can support ifdef blocks in the IL, no real problem there.

my own language does something like this with a form like:
$[ifdef(FOO)] {
...
}

but a language could be designed to allow it with a more traditionalsyntax, say:

#ifdef FOO
...
#endif

probably with the '...' code being folded off into its own code block.technically, this requires even nesting and disallows goto into/out-ofthe block, but this seems like a good enough strategy.



potentially, I could upgrade ifdef to a first-class syntactic form, say:
ifdef(FOO) { ... }

but, there are other issues along this road.

(2) Unnecessarily powerful languages, such as Forth or Postscript, aredifficult to reason about, difficult to audit. We are forced to stickthem in 'sandboxes', and the extra memory barriers and copyingoverheads slow them down, and consume more resources. If we design ourHLL for secure composition and provable (or semantically controllable)resource consumption, we can optimize it in-place, as though it were anative part of our application. One valid fear from code distribution- even secure code - is that it will consume too many resources (CPU,memory, bandwidth). We can accommodate that concern in the sandboxwith a low-level language, but not /gracefully/ - i.e. if we wantgraceful failures, we need semantics that support it, such as cleardisruption semantics.


Forth and PostScript are not "unnecessarily powerful".

probably, if the HLL is somewhere along C or C++ lines (with pointersand OOP and so on), then the IL being powerful is not the issue. onewill still need validation logic to make sure the pointers/... are notused in an unsafe manner (probably inserting runtime checks wheneveroperations can't be determined to be safe).

even rather gimped languages, like Java or C#, still have some of theseissues.

a language much weaker than these is probably too weak to be reallyusable in any non-trivial context.

(3) Most interesting code is not 'algorithmic'. Dataflow models can beoptimized considerably with some access to the relevant localstructure; decisions on caching, propagation, and parallelism, forexample, should wait until the code is in place. Thus, libraries andmodules for dataflow languages must provide a higher-level structureto the compiler, and compilation should happen /after/ linking. Inopen systems, linking is dynamic, and so must be compilation.

dataflow languages currently lack mainstream acceptance asapplication-development languages.


Procedural + OOP is probably a much better bet.

(4) The web model today is /extremely/ constrained: code distributionis single-origin, server-to-client. We want to scale deeper, wider.These things will happen: Clients will compose code from multipleservices. Services will receive agents and ambassadors from clients.Services will broker clients, who will then speak directly. Services,themselves, are clients to their dependencies and will thus repeat allthese processes. HLL code offers two critical features: (a) securecomposition, and (b) the ability to agglomerate resultingcompositions, identify relationships between dependencies, thenshatter and distribute shards closer to the appropriate resources.Static decisions about code can easily introduce orders of magnitudein bandwidth and latency inefficiency.


can't really make sense of the above.

(5) Higher level code is much more accessible to humans. Fordevelopers, it is easier to debug at the site where the problemoccurred (no painful shipping 'stack dumps' around). It is easier toextend or transform the code, e.g. using GreaseMonkey scripts orChrome extensions. It is easier to learn to code. Children can peekbehind the curtain - begin to understand and manipulate the sea ofcomputation in which they live.

doesn't matter much for applications, as you don't generally want theuser to know how it works.normally, the program is expected to be a sort of sealed black-box forthe creators' eyes only.

granted, this is not to say that FOSS people/... can't distribute insource form, but not everyone should *have* to distribute in source form.

it is like, the children peek behind the curtain, start messing withthings, ..., and find themselves in the world of copyright infringementand IP law, and/or find themselves or their parents being faced with alawsuit as a result.

keeping proprietary code hidden away thus also serves the purpose ofhelping to prevent these "innocent little children" from unintentionallycommitting criminal acts, ...

From every viewpoint I subscribe to - performance, security,scalability, flexibility, user rights - <b>bytecode is a BAD idea</b>as a distribution language. Other low-level languages are similarly bad.
Any good distribution language /will be/ high level, though not/because/ it's high level. There are quite a few desiderata for a weblanguage.

however, a lower-level language will be more abstracted from thehigh-level language, and more opaque-looking for prying eyes (likelymore important for commercial people, one wants the code as difficult toget at as can reasonably be done).

hence, for example, if one gets an EXE file, most of what is going on inthere is fairly well hidden. one can try to disassemble it and even thiswill often fail to work correctly. then one can keep their variousalgorithms, ... hidden.

CIL and JBC are a bit less secure, but theoretically still work ok, andthere are 3rd party obfuscator tools which are commonly used incommercial situations.

the main merit of bytecode over native code in these cases is thatbytecode tends to be more portable, but there is often a drawback thatbytecode based distribution formats may hinder what one can do in thelanguages, or the types of languages which can be used.

for example, JVM / JBC isn't really suitable for use with C or C++programs (the bytecode is just too limited).


.NET / CIL allows C and C++, but at a cost of it only working on MS targets.

also, VM dependencies are an awkward issue (as one finds they need apile of assorted VMs, none of which really plays well with the others).



granted, a lot depends on what one wants out of a VM.

an example would be if one wanted, say, a more free and open platformalong similar lines to Adobe Flash or Microsoft Silverlight, rather thanseeing a VM as a way to promote/enforce certain development anddistribution methodologies, or for that matter, certain languages.

granted, if the VM will also support loading programs directly fromsource code, then this is good as well, as then one has options.



or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Why Bytecode is a Bad Idea for Distribution

Reply via email to