On 10/9/10 3:45 PM, Dirk Pranke wrote:
C++ is a significant security concern; and it is reasonable to want a
browser written in a memory-safe language.

Unfortunately, web browsers are large, extremely
performance-sensitive, legacy applications. All of the major browsers
are written in some combination of  C, C++, and Objective-C (and
undoubtedly assembly in isolated areas like the JITs), and it's
unclear if one can reasonably hope to see a web browser written from
scratch in a new language to ever hope to render the majority of the
current web correctly; the effort may simply be too large. I was not
aware of Lobo; it looks interesting but currently idle, and is a fine
example of this problem.

I continue to hope, but I may be unreasonable :)

Yes, that seems like a good description of the problem.

How about this as a possibility towards a solution. Use OMeta (or Antlr or whatever :-) to parse all the C++ code and output some kind of semantic representation formatted in Lisp, RDF, Cola, OCaml, JavaScript, pure prototype objects, or whatever. Then write AI-ish code that analyzes that abstraction and can write out code again in C++ (or JavaScript, Smalltalk, Assembly, Lisp, OCaml, or whatever) but having done some analysis so that it can be proved there are no possible buffer overruns or misused pointers or memory leaks (and if there are grey areas or halting problem issues, then stick in range checks, etc.). And maybe optimize some other stuff while it is at it, too. So, treat C++ like assembly language and try to write the browser at a higher level of abstraction. :-) Ideally, the end result of this round-trip will look *identical* to what is read it (but I'm happy if the formatting is a little different or a while loop gets changed to a for loop or whatever. :-)

I've written some stuff that works like this (just a tiny bit, but no real AI) for reading our old Delphi code (using Antlr and Python) and creating an internal representation and then outputting code either in Java or Python/Jython. Now, libraries can be a mismatch problem (and are in that case, though I do a bit of the heavy lifting and then the programmer has to do a bunch of the rest), as can other semantic issues (and again, the conversion I did had some issues, but was better than rewriting by hand). But in theory, Delphi to Java/Jython/Swing should be doable as I outlined above -- with a bit more sophistication in the analysis, and at worst, with some additional tools to make whatever human input was required easier to do at a semantic level. :-)

Ideally, all the development and debugging would be done using the higher level abstraction, but if someone still modified the C++ code, presumably you could read the changes back into the higher level and integrate them back into the abstraction.

So, I guess my objection to C++ isn't so much that it is used in compiling browsers so much as that people are coding in it and not treating it as an intermediate language and that it is not undergoing some kind of sophisticated automated analysis to check for problems every single time a build is made. For example, Squeak has its VM written in Squeak Smalltalk, but translates it to C++. Now, that's not exactly what I am talking about, but it is closer (coding in Smalltalk is awkward in some ways, and while I have not generated a Squeak VM in a long time, I expect there could be potentially bugs in the Smalltalk that might lead to buffer problems in the C -- because there is not AI-ish programming Nanny checking what is being done). I'm proposing a more complex abstraction, one perhaps encoded in semantic triples or some other fancy AI-ish representation, even if you might interact with that abstraction in more than one way (editing textual files or moving GUI items around or whatever).

So, this way all that complexity about how to parse and render quirky HTML that is there can be preserved, but then it can be operated on by more sophisticated analysis tools, more sophisticated collaboration tools, and more sophisticated output tools.

Now, ideally, this is what FONC stuff should be able to support, but I don't know how far along OMeta etc. are to the point where they could make this easy? I get the feeling there is a whole other AI-ish layer of knowledge representation to reason about programs that would be needed above OMeta (perhaps like John was getting at with his points on formal analysis?).

Although, on the other hand, XHTML is coming along, and to render older pages I guess I could just do all my web browsing in a Debian installation in VirtualBox to sandbox potentially buggy C++ -- maybe even one VirtualBox install per loaded page? :-) Though two gigabytes here, two gigabytes there, and sooner or later we're talking real amounts of disk space for all those VirtualBox installs :-) But there is probably some way to get all the VirtualBoxes to share most of their virtual hard disks. So, anyway, there is a a potential way to deal with legacy security holes. :-) Still, since so much of web browsing also might interact with saving files and such, it would help to have an intelligent assistant (microscope/telescope) that could go into that VirtualBox (even just by OCRing the displayed GUI) and copy downloaded files out of VirtualBox or track WebStart invocations and create new VirtualBoxes or even run that code outside of one, or that could do any of the other desktop integration on might want. So, virtualization (increasing the level of abstraction) is one possible answer to the legacy issue, as is moving to better standards that are presumably easier to render correctly.

Maybe VirtualBox should just be integrated into Firefox or Chrome? :-)
  http://www.virtualbox.org/
Though that is only a x86 solution I guess.

Anyway, I know all these suggestions are things that either would take a lot of work and cutting edge research, or are just wildly implausible, but in the long term, new computing is probably going to address them in some way (with some combination of virtualization, translation, AI-ish reasoning at a higher semantic level of abstraction, tools for programming at a higher level of abstraction, improved standards, etc.).

--Paul Fernhout
http://www.pdfernhout.net/
====
The biggest challenge of the 21st century is the irony of technologies of abundance in the hands of those thinking in terms of scarcity.

_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Reply via email to