Re: Secure fine-grained webapp hosting with Rhino

Norris Boyd Mon, 16 Jun 2008 15:30:29 -0700

On Jun 11, 9:08 pm, Nick Thompson <[EMAIL PROTECTED]> wrote:
> Hi folks (and hi Norris!) -  I wanted to introduce a project we've
> been
> working on for a few months here at Metaweb using Rhino - basically a
> very simple server-side web framework called Acre.  The name stands
> for
> "A Crash of Rhinos, Evaluating" - "crash" being the collective noun
> for
> a group of rhinos.  We figure if you're hosting untrusted code,
> crashes
> are the least of your problems.
>
> Our main work at Metaweb is building an open, web-accessible database,
> which you can visit at <http://www.freebase.com>.  Our goal with Acre
> is to make it simpler for people to generate custom displays from that
> data.  This means we're hosting user code, so we are taking security
> very seriously, including DoS attacks.  We intend to make the code
> open source but it's been fluctuating quite a bit still so we haven't
> packaged it up.  But I'd like to start by describing how we're doing
> things right now and hopefully getting feedback.
>
> I'm not going to go into the client-side IDE here, this is about the
> rhino servlet aspects and security in particular.  Hopefully you'll
> let
> me know if I've committed any howlers.  And of course, thanks for
> rhino!
>
>     Nick
>
> == Acre request lifecycle ==
>
> Control flow starts with a java servlet container.  The servlet
> container (Jetty) dispatches HTTP requests to the Acre servlet.
>
> - the servlet creates a rhino global scope from scratch using
> initStandardObjects().  It would be nice to use the shared sealed
> scope trick documented in <http://www.mozilla.org/rhino/scopes.html>,
> but for now that's premature optimization.
>
> - the servlet initializes a "HostEnv" object for the request,
> which exposes the Scriptable interface and serves as the sole
> interface between Javascript and the acre container.  Liveconnect is
> disabled after initialization.
>
> - initial url parsing is done in the servlet because it defines
> a security context.  We dispatch on subdomains in the url, similar
> to an apache vhost configuration but dynamically looked up
> in the database. The subdomain becomes an "origin" for security
> purposes on the server, just as it implies a separate browser security
> context for html pages delivered from the application with that
> domain.
>
> - the HostEnv is inserted in the global scope, and the servlet
> loads a built-in fragment of javascript code (acreboot.js).
>
> - acreboot.js wraps the HostEnv object in a private closure,
> turns off liveconnect, and removes the liveconnect namespaces
> from the global scope.
>
> - more url parsing is done in acreboot.js to find a document
> and its metadata in the freebase data store.  this document is
> fetched, and executed according to its document type.
>
> - plain "script" documents run using an interface inspired
> by cgi and by python's wsgi.  Request data, including both
> parsed and unparsed portions of the url, are present in
> the "acre.environ" global.  Scripts can generate output
> using acre.start_response() and acre.write().
>
> - user "script" documents are compiled to java classes and cached in
> memory.  we don't currently use a restricted classloader here but that
> would provide an additional layer of security.
>
> - user "template" documents are compiled using a variant of the mjt
> <http://mjtemplate.org/> javascript templating language.  Mjt compiles
> the xml templates to javascript, which are then compiled by Rhino to a
> java class and cached in the acre server.
>
> - the compiled user script or template runs in the environment
> set up by acreboot.js
>
> - there is a simplistic package system, where one script or template
> can pull in another as a library.  In this case, the library is
> executed in a fresh scope object with __proto__ set to the
> true global object and __parent__ set to null.  This object provides
> the toplevel for the library, and it becomes the package object
> returned to the script which loads the package.  Packages are only
> loaded once.  This enables a somewhat cleaner package mechanism
> than is available in the browser.  However, libraries written for the
> browser that add their functions to a global object can still be used
> unmodified or with a small setup before loading.
>
> - logs and diagnostics associated with a request are saved for a
> reasonable length of time on the same acre server that serviced the
> request.  The logs can be fetched in JSON format for formatting
> in the IDE.
>
> - when the request finishes, the global object and the associated
> HostEnv object should be reclaimed by the gc.  There is no provision
> for scripts that last longer than a single request, or scripts that
> stream
> their input or output.
>
> == Related work ==
>
> The main similar projects out there are Helma and AppJet.  We looked
> at Helma, and the original acre prototype involved rewriting parts of
> a Helma servlet in javascript, though that code has since been
> rewritten in java again.  However, Helma has way more built-in than we
> need and too much of it was in java.  We want to provide a much more
> minimal app framework and allow users to build on top of that in
> javascript.  AppJet has similar goals to Acre and seems to have made
> some similar design decisions.  We'd love to share ideas and/or code
> with any of these projects of course.
>
> == Access security ==
>
> One of the attractions of Rhino is that it provides several layers of
> software sandboxing.  The sandboxing is critical for security of
> course.  But doing it in software means we can run very small,
> occasionally used apps.  Contrast this to cpython (as used by
> AppEngine) or PHP (as used by Ning) which are not natively sandboxable
> languages and must therefore be sandboxed using coarser-grain methods.
>
> First level security comes from controlling the javascript namespace.
> Once the HostEnv object is hidden inside a closure, no java objects
> outside the core rhino runtime should be visible to user code.
>
> The next level uses ClassShutter (and eventually
> PolicySecurityController) to catch any leaks that we didn't plug in
> step 1.  The main work here is wrapping all possible java runtime
> exceptions with sensible javascript exceptions, and we aren't done
> yet.
>
> == Network security ==
>
> The acre runtime provides a single entry point for making network
> requests, acre.urlfetch().  Currently only HTTP requests are
> supported.
>
> Url fetch requests are not currently sandboxed, but the approach
> we've used for other services is to force all outbound requests
> through a proxy and handle access control and throttling at that
> point.
>
> == Denial of service attacks ==
>
> Denial of service attacks are handled in a few different ways.
>
> Scripts are killed if they exceed a certain time limit.  We
> needed a small patch to Codegen.java to count cases like
> "while (1) {}" that don't increment the instruction count
> in the loop.
>
> Acre requests run entirely on the same thread, so java stack
> overflow errors are easy to handle by terminating the request
> thread.
>
> The java vm throws an OutOfMemoryError when too much memory is
> allocated, but it's delivered to the thread that tripped the limit
> rather than the the thread that allocated the most memory.  This is
> harder to handle than the stack overflow case: the jvm provides
> surprisingly bad support for limiting memory consumption at this
> point.
>
> We've looked at three approaches to handling jvm memory exhaustion:
>
> 1. Assume the thread which triggered OutOfMemory is the offender
> and kill it.  If we run out of memory again, repeat until the problem
> goes away.  This is simple and works for development, but is not
> acceptable for a production service.
>
> 2. Track memory allocations by using AspectJ to instrument all
> the java classfiles.  This is our current approach but it's hard to
> establish correctness without instrumenting everything and killing
> performance.
>
> 3. It turns out that the java hotspot vm already tracks memory
> allocated by each thread as a side-effect of its allocation strategy.
> Since acre only uses one thread per request, we can probably make a
> relatively small change to the OpenJDK code to get what we need with
> minimal performance impact.
>
> == Distributed denial of service attacks ==
>
> We don't currently have a mechanism for per-application resource
> quotas (as opposed to per-request quotas).  This will require a
> supervisor process that can collect usage information from a
> number of Acre application servers.


Hi, Nick! (Nick and I worked at Netscape together back in the day and
he helped with some of the early design of Rhino.)

This framework sounds great. I'm glad Rhino provided mostly all the
support you needed. For the two exceptions:
* I'd love to have the "while (1) { }" instruction counting patch to
incorporate back into Rhino.
* I don't know of any better way to enforce a memory allocation quota.
It's possible to imagine tracking allocations as they occur from
scripts, but given that the actual allocations can occur in Java code
called by the scripts it seems like this would be difficult at best to
build and maintain. Your idea of extending the JVM is an intriguing
possibility. I have a vague memory of some proposal for the JVM
enforcing quotas, but can't find anything with a brief search. Anyone
on this list know more?

Finally, I'm curious about Rhino's performance if you've had a chance
to look at it. Other systems I've heard of (including RnR, see
http://steve-yegge.blogspot.com/2008/06/rhinos-and-tigers.html) have
had adequate performance, but I know it's always a concern of people
considering this sort of solution.

Thanks for posting and looking forward to your open-sourcing Acre!

--Norris
_______________________________________________
dev-tech-js-engine-rhino mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-rhino

Re: Secure fine-grained webapp hosting with Rhino

Reply via email to