On 10/9/10 3:45 PM, Dirk Pranke wrote:
C++ is a significant security concern; and it is reasonable to want a
browser written in a memory-safe language.
Unfortunately, web browsers are large, extremely
performance-sensitive, legacy applications. All of the major browsers
are written in some combination of C, C++, and Objective-C (and
undoubtedly assembly in isolated areas like the JITs), and it's
unclear if one can reasonably hope to see a web browser written from
scratch in a new language to ever hope to render the majority of the
current web correctly; the effort may simply be too large. I was not
aware of Lobo; it looks interesting but currently idle, and is a fine
example of this problem.
I continue to hope, but I may be unreasonable :)
Yes, that seems like a good description of the problem.
How about this as a possibility towards a solution. Use OMeta (or Antlr or
whatever :-) to parse all the C++ code and output some kind of semantic
representation formatted in Lisp, RDF, Cola, OCaml, JavaScript, pure
prototype objects, or whatever. Then write AI-ish code that analyzes that
abstraction and can write out code again in C++ (or JavaScript, Smalltalk,
Assembly, Lisp, OCaml, or whatever) but having done some analysis so that it
can be proved there are no possible buffer overruns or misused pointers or
memory leaks (and if there are grey areas or halting problem issues, then
stick in range checks, etc.). And maybe optimize some other stuff while it
is at it, too. So, treat C++ like assembly language and try to write the
browser at a higher level of abstraction. :-) Ideally, the end result of
this round-trip will look *identical* to what is read it (but I'm happy if
the formatting is a little different or a while loop gets changed to a for
loop or whatever. :-)
I've written some stuff that works like this (just a tiny bit, but no real
AI) for reading our old Delphi code (using Antlr and Python) and creating an
internal representation and then outputting code either in Java or
Python/Jython. Now, libraries can be a mismatch problem (and are in that
case, though I do a bit of the heavy lifting and then the programmer has to
do a bunch of the rest), as can other semantic issues (and again, the
conversion I did had some issues, but was better than rewriting by hand).
But in theory, Delphi to Java/Jython/Swing should be doable as I outlined
above -- with a bit more sophistication in the analysis, and at worst, with
some additional tools to make whatever human input was required easier to do
at a semantic level. :-)
Ideally, all the development and debugging would be done using the higher
level abstraction, but if someone still modified the C++ code, presumably
you could read the changes back into the higher level and integrate them
back into the abstraction.
So, I guess my objection to C++ isn't so much that it is used in compiling
browsers so much as that people are coding in it and not treating it as an
intermediate language and that it is not undergoing some kind of
sophisticated automated analysis to check for problems every single time a
build is made. For example, Squeak has its VM written in Squeak Smalltalk,
but translates it to C++. Now, that's not exactly what I am talking about,
but it is closer (coding in Smalltalk is awkward in some ways, and while I
have not generated a Squeak VM in a long time, I expect there could be
potentially bugs in the Smalltalk that might lead to buffer problems in the
C -- because there is not AI-ish programming Nanny checking what is being
done). I'm proposing a more complex abstraction, one perhaps encoded in
semantic triples or some other fancy AI-ish representation, even if you
might interact with that abstraction in more than one way (editing textual
files or moving GUI items around or whatever).
So, this way all that complexity about how to parse and render quirky HTML
that is there can be preserved, but then it can be operated on by more
sophisticated analysis tools, more sophisticated collaboration tools, and
more sophisticated output tools.
Now, ideally, this is what FONC stuff should be able to support, but I don't
know how far along OMeta etc. are to the point where they could make this
easy? I get the feeling there is a whole other AI-ish layer of knowledge
representation to reason about programs that would be needed above OMeta
(perhaps like John was getting at with his points on formal analysis?).
Although, on the other hand, XHTML is coming along, and to render older
pages I guess I could just do all my web browsing in a Debian installation
in VirtualBox to sandbox potentially buggy C++ -- maybe even one VirtualBox
install per loaded page? :-) Though two gigabytes here, two gigabytes there,
and sooner or later we're talking real amounts of disk space for all those
VirtualBox installs :-) But there is probably some way to get all the
VirtualBoxes to share most of their virtual hard disks. So, anyway, there is
a a potential way to deal with legacy security holes. :-) Still, since so
much of web browsing also might interact with saving files and such, it
would help to have an intelligent assistant (microscope/telescope) that
could go into that VirtualBox (even just by OCRing the displayed GUI) and
copy downloaded files out of VirtualBox or track WebStart invocations and
create new VirtualBoxes or even run that code outside of one, or that could
do any of the other desktop integration on might want. So, virtualization
(increasing the level of abstraction) is one possible answer to the legacy
issue, as is moving to better standards that are presumably easier to render
correctly.
Maybe VirtualBox should just be integrated into Firefox or Chrome? :-)
http://www.virtualbox.org/
Though that is only a x86 solution I guess.
Anyway, I know all these suggestions are things that either would take a lot
of work and cutting edge research, or are just wildly implausible, but in
the long term, new computing is probably going to address them in some way
(with some combination of virtualization, translation, AI-ish reasoning at a
higher semantic level of abstraction, tools for programming at a higher
level of abstraction, improved standards, etc.).
--Paul Fernhout
http://www.pdfernhout.net/
====
The biggest challenge of the 21st century is the irony of technologies of
abundance in the hands of those thinking in terms of scarcity.
_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc