The desktop Firefox team is building a new Toolkit module that captures 
thumbnails of off-screen web pages. Critically, we want to avoid capturing any 
data in these thumbnails that could identify the user. More generally, we're 
looking for a way to visit pages in a sandboxed manner that does not interact 
with the user's normal browsing session. Does anyone know of such a way or know 
how we might change Gecko to support something like that?

To provide context, I'll describe the original problem that motivated the new 
thumbnail module, the poor way in which the new module interacts with the 
user's browsing state, and then some things we think we need in order to 
sandbox pages. See bug 870179 for motivation for this post.

The Original Problem

Toolkit already has a thumbnail module, [PageThumbs], but it can only capture 
thumbnails of open content windows, same as they appear to the user. Windows 
may contain sensitive data that should not be recorded in an image, however, 
like bank account numbers and so on, so Firefox uses some [heuristics] to 
determine when it's safe to capture a given window. If it's not safe, then 
Firefox makes no further attempt to capture the window's page until the user 
happens to visit it again. If it's never safe to capture any visit to the page, 
then the page never has a thumbnail. This is why you might end up with lots of 
blank thumbnails in Firefox's about:newtab page. (One of the most notable 
heuristics is the presence of the header Cache-Control: [no-cache]. Firefox 
treats it as an indication that a page may contain sensitive data and therefore 
should not be captured.)

Our Solution

We wrote a [new module] that's not limited to capturing thumbnails of open 
windows. It loads pages on demand in a xul:browser in the hidden window and 
then captures them. This browser is remote so that its page loads don't block 
the main thread (to the extent that page loads in remote browsers don't block 
the main thread). Further, the browser uses private browsing mode to sandbox 
its pages from the user's normal browsing session. If you're logged in to a 
site in a main window, your logged-in status is not reflected in thumbnails of 
that site.

The Problem with Our Solution

The thumbnail browser sandboxes its pages from your normal browsing session, 
but of course it doesn't sandbox them from your private browsing session. If 
you're logged in to a site in a [private browsing] window, then you'll also be 
logged in in the page we load in the thumbnail browser, which is the exact 
thing we're trying to avoid by using a private browser. This is a consequence 
of private browsing's binary design: it's either on or off, and all the various 
Gecko bits that manage state are designed to use one state for private requests 
and another state for normal requests. There's no notion of multiple concurrent 
private browsing sessions.

We avoid this problem in a crude way by ignoring capture requests while there 
are private windows open. We need a better solution that allows us to capture 
sandboxed pages no matter what the user is doing.

Requirements

We think we need the following to sandbox pages in our thumbnail browser:

(a) Requests must be stateless. Requests must not include any information 
derived from previous requests. e.g., requests must not include cookies.

(b) Requests must leave no trace. Requests must not be stored in whole or part, 
and no information about or derived from requests must be stored. e.g., the 
request must not be recorded in the user's history.

(c) Responses must leave no trace. Responses must not be stored in whole or 
part, and no information about or derived from responses must be stored. e.g., 
cookies returned by the response must be ignored, and the response must not be 
recorded in the user's history.

A fourth requirement unrelated to sandboxing is:

(d) We need to load pages off-screen, and the user should never have to know 
about it. It shouldn't impact browser responsiveness, windows and dialogs 
triggered by pages shouldn't pop up, audio shouldn't be audible, etc. (We've 
done some of this work already by making the browser remote, putting it in the 
hidden window, and in bugs 875157 and 759964.)

What do people think? If what I've described is not already possible, how much 
work would it be to support it in Gecko, if only for this narrow use case of 
thumbnailing?

Thanks,
Drew

[PageThumbs] 
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/thumbnails/PageThumbs.jsm

[heuristics] 
http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser-thumbnails.js#127

[no-cache] https://bugzilla.mozilla.org/show_bug.cgi?id=754608

[new module] https://bugzilla.mozilla.org/show_bug.cgi?id=841495

[private browsing] https://bugzilla.mozilla.org/show_bug.cgi?id=870179
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to