Hi folks (and hi Norris!) - I wanted to introduce a project we've been working on for a few months here at Metaweb using Rhino - basically a very simple server-side web framework called Acre. The name stands for "A Crash of Rhinos, Evaluating" - "crash" being the collective noun for a group of rhinos. We figure if you're hosting untrusted code, crashes are the least of your problems.
Our main work at Metaweb is building an open, web-accessible database, which you can visit at <http://www.freebase.com>. Our goal with Acre is to make it simpler for people to generate custom displays from that data. This means we're hosting user code, so we are taking security very seriously, including DoS attacks. We intend to make the code open source but it's been fluctuating quite a bit still so we haven't packaged it up. But I'd like to start by describing how we're doing things right now and hopefully getting feedback. I'm not going to go into the client-side IDE here, this is about the rhino servlet aspects and security in particular. Hopefully you'll let me know if I've committed any howlers. And of course, thanks for rhino! Nick == Acre request lifecycle == Control flow starts with a java servlet container. The servlet container (Jetty) dispatches HTTP requests to the Acre servlet. - the servlet creates a rhino global scope from scratch using initStandardObjects(). It would be nice to use the shared sealed scope trick documented in <http://www.mozilla.org/rhino/scopes.html>, but for now that's premature optimization. - the servlet initializes a "HostEnv" object for the request, which exposes the Scriptable interface and serves as the sole interface between Javascript and the acre container. Liveconnect is disabled after initialization. - initial url parsing is done in the servlet because it defines a security context. We dispatch on subdomains in the url, similar to an apache vhost configuration but dynamically looked up in the database. The subdomain becomes an "origin" for security purposes on the server, just as it implies a separate browser security context for html pages delivered from the application with that domain. - the HostEnv is inserted in the global scope, and the servlet loads a built-in fragment of javascript code (acreboot.js). - acreboot.js wraps the HostEnv object in a private closure, turns off liveconnect, and removes the liveconnect namespaces from the global scope. - more url parsing is done in acreboot.js to find a document and its metadata in the freebase data store. this document is fetched, and executed according to its document type. - plain "script" documents run using an interface inspired by cgi and by python's wsgi. Request data, including both parsed and unparsed portions of the url, are present in the "acre.environ" global. Scripts can generate output using acre.start_response() and acre.write(). - user "script" documents are compiled to java classes and cached in memory. we don't currently use a restricted classloader here but that would provide an additional layer of security. - user "template" documents are compiled using a variant of the mjt <http://mjtemplate.org/> javascript templating language. Mjt compiles the xml templates to javascript, which are then compiled by Rhino to a java class and cached in the acre server. - the compiled user script or template runs in the environment set up by acreboot.js - there is a simplistic package system, where one script or template can pull in another as a library. In this case, the library is executed in a fresh scope object with __proto__ set to the true global object and __parent__ set to null. This object provides the toplevel for the library, and it becomes the package object returned to the script which loads the package. Packages are only loaded once. This enables a somewhat cleaner package mechanism than is available in the browser. However, libraries written for the browser that add their functions to a global object can still be used unmodified or with a small setup before loading. - logs and diagnostics associated with a request are saved for a reasonable length of time on the same acre server that serviced the request. The logs can be fetched in JSON format for formatting in the IDE. - when the request finishes, the global object and the associated HostEnv object should be reclaimed by the gc. There is no provision for scripts that last longer than a single request, or scripts that stream their input or output. == Related work == The main similar projects out there are Helma and AppJet. We looked at Helma, and the original acre prototype involved rewriting parts of a Helma servlet in javascript, though that code has since been rewritten in java again. However, Helma has way more built-in than we need and too much of it was in java. We want to provide a much more minimal app framework and allow users to build on top of that in javascript. AppJet has similar goals to Acre and seems to have made some similar design decisions. We'd love to share ideas and/or code with any of these projects of course. == Access security == One of the attractions of Rhino is that it provides several layers of software sandboxing. The sandboxing is critical for security of course. But doing it in software means we can run very small, occasionally used apps. Contrast this to cpython (as used by AppEngine) or PHP (as used by Ning) which are not natively sandboxable languages and must therefore be sandboxed using coarser-grain methods. First level security comes from controlling the javascript namespace. Once the HostEnv object is hidden inside a closure, no java objects outside the core rhino runtime should be visible to user code. The next level uses ClassShutter (and eventually PolicySecurityController) to catch any leaks that we didn't plug in step 1. The main work here is wrapping all possible java runtime exceptions with sensible javascript exceptions, and we aren't done yet. == Network security == The acre runtime provides a single entry point for making network requests, acre.urlfetch(). Currently only HTTP requests are supported. Url fetch requests are not currently sandboxed, but the approach we've used for other services is to force all outbound requests through a proxy and handle access control and throttling at that point. == Denial of service attacks == Denial of service attacks are handled in a few different ways. Scripts are killed if they exceed a certain time limit. We needed a small patch to Codegen.java to count cases like "while (1) {}" that don't increment the instruction count in the loop. Acre requests run entirely on the same thread, so java stack overflow errors are easy to handle by terminating the request thread. The java vm throws an OutOfMemoryError when too much memory is allocated, but it's delivered to the thread that tripped the limit rather than the the thread that allocated the most memory. This is harder to handle than the stack overflow case: the jvm provides surprisingly bad support for limiting memory consumption at this point. We've looked at three approaches to handling jvm memory exhaustion: 1. Assume the thread which triggered OutOfMemory is the offender and kill it. If we run out of memory again, repeat until the problem goes away. This is simple and works for development, but is not acceptable for a production service. 2. Track memory allocations by using AspectJ to instrument all the java classfiles. This is our current approach but it's hard to establish correctness without instrumenting everything and killing performance. 3. It turns out that the java hotspot vm already tracks memory allocated by each thread as a side-effect of its allocation strategy. Since acre only uses one thread per request, we can probably make a relatively small change to the OpenJDK code to get what we need with minimal performance impact. == Distributed denial of service attacks == We don't currently have a mechanism for per-application resource quotas (as opposed to per-request quotas). This will require a supervisor process that can collect usage information from a number of Acre application servers. _______________________________________________ dev-tech-js-engine-rhino mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-js-engine-rhino
