[b2g] Proposal for Cloud Backup & Restore

Ian Bicking Wed, 24 Sep 2014 13:16:12 -0700

Hi!  I'm sorry I haven't engaged with this group previously, as has been
discussions of backup and restore on several lists and with many different
perspectives.  I am doing engineering management on the backup and restore
project within Cloud Services.

I've been talking to many people individually to collect ideas, and am
working on a project plan for cloud backup and restore. We have not
started implementation, so this proposal is very provisional, and can be
changed with your feedback. The proposal is still unfortunately quite
rough, but I really wanted to get it out this week, and I'm going to be
offline for the rest of the week, so I thought it better to put it in front
of you all instead of waiting longer.

I have composed it in a Google Doc:
https://docs.google.com/document/d/1n4LkS5fJFocIZW7gbXDWQLkUAoKwIqcVrVdq0lgKhjk/edit?usp=sharing

The document is public and anyone can comment on it. (I expect to move it
to the wiki later.) I will also paste it below:

About This Document

At some point this will become a more formal proposal, but right now this
is based on the opinions and information collected by me, Ian Bicking. I
will use the first person at times in this document.

In Cloud Services we have not yet implemented anything in this proposal
(outside of some experimental work on requestSync). We expect to present
this for consideration as part of the upcoming Firefox OS roadmap planning.

Some material that influenced this:

“Save/Restore feature” b2g thread:
https://groups.google.com/forum/#!topicsearchin/mozilla.dev.b2g/Backup/mozilla.dev.b2g/VPkqmLWP4vk

“Backups for Firefox OS” b2g thread:
https://groups.google.com/forum/#!topicsearchin/mozilla.dev.b2g/Backup/mozilla.dev.b2g/1OUNmQ7ouPY

Third-party management app for Firefox OS:
https://bugzilla.mozilla.org/show_bug.cgi?id=91942
<https://bugzilla.mozilla.org/show_bug.cgi?id=919420>0

In this document (including after it is formalized) I want to indicate the
reasons for the decisions we have made, and include some discussion of
directions we chose not to make.

The Broadest Decision: where do we put backups?

Market research has indicated that in our current target market, non-cloud
backup is the most appealing option. Data plans are very limited, and in
many markets Wifi is also uncommon.

SD Card Backup

An SD card backup option can work for nearly anyone, so long as they can
afford the SD card, or potentially even buddy up with someone else (trading
SD cards for backup would be a reasonable option). Because devices only
support a single SD card, this option doesn’t provide backup for any data on
the SD card. Photos and video in particular would not be backed up.

Pros:

Requires no data or any other device
2.

Fairly affordable
3.

One SD card could support multiple people’s backups
4.

By saving to different locations, it becomes easy to do snapshots, which
perhaps could be used to retrieve lost data
5.

Because writing to SD cards is fast, you can easily back up data that
has no external structure (i.e., no structure known to the backup system
itself)
6.

Security is very obvious: physical possession implies access, no
accounts involved (some of our target market doesn’t seem to understand
password-based or account-based security schemes)

Cons:

Probably wouldn’t support backing up anything that is already stored on
the SD card, like photos
2.

Requires swapping cards to around to make backups. That relatively
intrusive process makes it hard to keep it up to date “on demand”
3.

Only security is physical (unless we support some kind of crypto, which
would be possible – but then the backup is contingent on the password, and
further recovery options would be challenging, though again not impossible)
4.

You either have a huge wealth of space on your destination SD card,
essentially wasted, or you have not enough space on the SD card, requiring
some kind of space management – there’s no pooling of storage

Tethered/USB Backup

Tethered backup would mean connecting your phone to your desktop/laptop
computer via USB. Some application on the computer would handle or at
least facilitate the backup. Work has been done on a Firefox Add-on that
can do this. Yuan Xulei referenced some work Mozilla China has done for
this
<https://groups.google.com/d/msg/mozilla.dev.b2g/VPkqmLWP4vk/y-BPO2ZgMGMJ>.
The slides
<https://onedrive.live.com/view.aspx?cid=8BE02BA87AAE54D9&resid=8BE02BA87AAE54D9!107&app=Word>
show a tool that can manage and edit data directly, upload music, and
presumably do other functions in addition to backup. There may be other
efforts around a tethered/USB experience that I’m not aware of.

Pros:

Many users have some desktop computer available to them, and familiarity
with the tethering process from other phones
2.

Given hard disks, the necessary storage for backing up a phone is a
pretty much a given
3.

The physical security is usually excellent. Unless you are backing up
to a USB drive, in which case it’s easy to imagine cached data or other
unexpected issues causing a problem.
4.

You can do a lot more than just back up, like managing music files.
1.

Is there a situation where desktop computers have an internet
connection, but wifi is not available? E.g., would people download
things from their desktop computer in order to transfer them to their
phone? (Like with podcasting.)
5.

It’s more feasible to do time-snapshot backups than SD (given the more
plentiful storage, and the assumption that the entire process is more
mediated)
6.

Easy to backup photos and video, and the backups are useful on the
desktop computer.

Cons:

You need some special local application to connect to the phone (except
for simple backups like backing up the images, where
phone-as-external-storage is sufficient)
2.

Backups requiring connecting to the computer, though from there the
experience can be automated fairly well
3.

You need access to a desktop computer
4.

Probably not reasonable for users to do on a kiosk computer (e.g., at an
internet cafe)

P2P Backup

This is the dark horse of the options. In this model you can imagine two
phones connecting to each other (over wifi or Bluetooth), with one backing
up to the other. This is somewhat like the tethered use case, except you’d
be wirelessly tethering to another phone. The space is still limited as in
the SD case, and while unlike SD Backup you don’t get the problem with
backing up SD-hosted images to another SD card, you get the space
limitations that make it questionable (would my SD card have enough space
for all your data? – with a separate SD card purchased specifically for
your backup needs the free space is more likely to be available).

Why Cloud Backup?

This proposal is not an argument against local backup options. Pursuing
multiple approaches to backup is very reasonable.

But still: why should we pursue cloud backup even though this does not seem
to match market demand?

Some strategic arguments:

Any backup option we build now will be delivered not to our current
users, but to future users. Those future users may have better access to
data plans or wifi.
1.

I think there are some confirmed partners that will be targeting
these kinds of users.
2.

Just generally: if Firefox OS is our attempt to bring a fully web
experience to phones, and the web that we know and love is (mostly) a
subset of the internet, then Firefox OS without data (i.e., without
internet) is at most an attempt to enter a market, but can’t be the
realization of our mission. (I’m sure there are people who will disagree
with me – I bring this point up for discussion, but please don’t see the
argument for Cloud Backup as hinging on this particular point.)
2.

There’s a greater variety of approaches possible with online backup, so
the platform additions we add for this feature will open up other
third-party experiences. It’s hard to imagine other developers or
companies pursuing USB or SD-card based approaches. (Though some of what
I’ve heard about app kiosks in India may indicate I’m wrong here.)
3.

By associating the data with a Firefox Account, we open possibilities
that this work could be complementary to work on other platforms. For
example, with Loop/Hello we are building a strong use case for sharing
contacts across devices. (Note though that multi-device use cases are not
part of the initial proposed work on Backup & Restore.)
1.

I’m not implying we will start acquiring and using people’s data
freely, but this would start the process of us storing people’s
data in the
cloud. Given that, if a person wants to use our other services, then
we will be better able to provide them a convenient process to do so.

And the more technical arguments:

Pros:

Backups can happen silently and up-to-the-minute
2.

We can also provide options to only do backup in certain situations
(e.g., on wifi), or with manual intervention
3.

The restore process can happen anywhere, including at the store – you
don’t need to connect to a computer, or remember to bring in your SD card
4.

It’s unlikely you’ll lose your backup, because it’s not physically
available. It can be protected by an Firefox Account, but still available
if you’ve forgotten your password (assuming you still have access to your
email address to reset your account password).
5.

It can be invoked remotely. If using Find My Device, we can remote
trigger a backup before wiping the device.

Cons:

Many of our users have very limited data and no access to wifi
2.

Those users also want to audit their data usage very carefully – even if
they might have sufficient data to do backups, they might be uncomfortable
with a system that might go wrong and use up their data. All background
services require trust
3.

Data also affects battery life (though the frequency of our access may
not register as a substantial battery drain)
4.

The restore process will require a bunch of data right away! A service
that only requires a small amount of data each month to incrementally back
things up may seem much more substantial at restore time. (Can people do
restore in the store on a provided wifi network?)
5.

Backing up images and video online is too expensive for us to do it for
free, which adds a whole question of paid services and clarity around cost
(SD cards aren’t free, but you only pay once), longevity of data, added
services (sharing, viewing), etc. For other data like contacts we can
probably host the data cheaply enough that it wouldn’t even be worth
setting up payments.
6.

Cloud data transfer is never cheap or fast enough to use speculatively.
This makes it much harder to save the data of individual applications,
which may have large amounts of data that doesn’t actually need to be saved
(like cached resources). Also it’s hard to do incremental backups of data
without understanding the data model. Especially on an SD card you can
imagine dumping the entirety of the data every time you backup, without
having to be concerned about doing diffs since the last backup.
7.

As a corollary to the previous point, it’s unlikely we can backup
applications’ data without specific support from the applications. Dumping
the entirety of IndexedDB and localStorage is probably too inefficient.
8.

There’s a whole set of concepts many of our users may not understand:
accounts, passwords, possible complexities if multiple devices share an
account, etc.
9.

Things get complicated if you migrate Firefox Accounts. With physical
devices the patterns are more obvious.

Look at the size of that list of cons! I’m glad we’re not scoring them.

With that behind us, let’s look at the questions about how to design Cloud
Backup & Restore specifically:

Proposed Design Of Cloud Backup
Goals

Some things we want to achieve:

We want to be as conservative with data usage as we can
2.

People should be able to forget about the backup system, and still have
it reliably restore data when they need it (and they probably need it
regardless of whether they expected to)
3.

We should be explicit about what kinds of data we have saved, and what
we have not saved
4.

We will access data through DOM APIs. We want other people to be able
to write services that function in their own way, and if Mozilla’s backup
system uses its special (social!) access to other parts of Firefox OS then
we are excluding other developers.
1.

It is an open discussion what those DOM APIs should look like. We
have examples like the Contacts API
<https://developer.mozilla.org/en-US/docs/Web/API/Contacts_API>,
which are very data-specific. The Data Store API
<https://developer.mozilla.org/en-US/docs/Web/API/Data_Store_API>is
more generic. The Apps Import/Export API
<https://bugzilla.mozilla.org/show_bug.cgi?id=982874>is designed more
specifically with these kinds of use cases. One can also
imagine even more
specific APIs directed at Backup/Restore use cases.
2.

The work to make data available to DOM APIs is also where I hope work
for Cloud Backup can overlap well with SD or USB Backup.
5.

Mozilla’s backup will be primarily implemented as an application. If we
need platform support that is not currently available (and we do!), then we
will pursue platform additions that are available to everyone.
6.

Future versions of the application will make every attempt to be
backward compatible with older versions of the OS.
7.

Rather than create an abstraction layer over the network protocols, we
feel it’s better to encourage forking of the application.
1.

Certified APIs limit the actual flexibility we can offer with these
techniques. By definition backup & restore requires powerful access to
personal data on the OS (including things like restoring data
with original
timestamps).
8.

We want to be able to host this data as affordably as possible,
optimized for the at-rest data storage cost. This isn’t a high-churn data
storage problem like Firefox Sync has.

Approach

Given the goals, I propose:

Backend

The data itself will be stored on Amazon S3. S3 is particularly
affordable for its reliability. We consider S3 reliability to be
sufficient without any further redundancy. (This last statement should
probably be triple-checked with ops and product.)
2.

Backup and restore will be controlled by a Firefox Account.
3.

Devices will never connect directly to S3, instead they will go through
application servers.
4.

There will be one S3 resource per user/content-type. E.g., ibicking@
mozilla.com/contacts will be one resource containing all that person’s
contacts. Traffic to and from S3 from the application server is free, but
each request does have a cost, so if we break up backups into many
resources we are likely to have higher costs.
1.

Perhaps the resource should also be per device. Then if I setup my
first Firefox OS device, then setup my second (restoring), then start
backing up, but still keep my first device operating, the backups
won’t get intermingled. OTOH usually this is not necessary.
5.

Devices will send incremental updates. The application server will
handle changing the S3 resource, fetching the resource, appending the data
(possibly deleting data) and re-uploading.
6.

Data in S3 will be encrypted. In FxA terminology we will be using Key
A. This is a stable key associated with the user account, and does not
change when the user changes their password. This allows a user to do a
password reset just before doing a restore.
7.

At some later stage we might offer other encryption options. These
could include using Key B, which changes when you change your FxA password
(and which we cannot derive), or perhaps encryption at some other stage.
This is not part of MVP.
8.

Data from the phone will be transmitted to our application servers
unencrypted (except with https). Because we ultimately want to be able to
decrypt the data on restore, even if the user has lost their password, we
have to be able to produce the data in an unencrypted form. It does not
seem like it would add security to encrypt it before delivery.
9.

We’ll do everything we can to keep the amount of data transferred low,
including minimizing the number of requests (aka chattiness) of the
protocol.
1.

Note however we are not doing protocol-first design. The protocol is
designed to satisfy our goals, not for its own merit.
10.

Given an FxA login, it will be possible to detect if there is backup
data associated with the login.

Client

We expect the client to be developed alongside the backend, we want to
treat the end-to-end working backup as our team project.
2.

The client will be a separate app. There are two motivations:
1.

Keep the client code partitioned from other code, reduce the need for
cross-team reviews, minimize impact on other people’s work
2.

Allow the backup app to be updated on a separate schedule from the OS
3.

We will need some platform support to make backup-as-an-app feasible:
1.

There is an API called requestSync
<https://github.com/slightlyoff/BackgroundSync/blob/master/explainer.md>
that allows the backup app to ask to be woken up only when it is
appropriate to do network requests (when networking is available, and per
user preference perhaps only on wifi or only when manually invoked). We
have done some initial work on this API, but will need help to get it in
shape to be landed.
2.

From a UX perspective we may not want backup to show up as an app,
but instead show up as a setting. (I don’t have much opinion on
this, but
will look for guidance from UX.)
1.

The actual “UI” of the backup app will consist of settings, error
messages and notifications, and perhaps status. This is why
it isn’t a
very good app.
3.

We might want to expedite the opt-in flow in FTE or as part of FxA
signup.
4.

When you first sign in to an FxA account, we would like to check for
a backup and offer to restore at that time.
4.

We expect integrate via DOM APIs. This is a big topic and gets its own
section.
5.

We also expect to restore data via DOM APIs. This often requires more
permission than merely fetching the data, especially to restore with
timestamps or other original metadata.
6.

We are not expecting to do a complete system backup, e.g., back up the
entirety of the profile. Instead we expect to do data-specific backups
(backing up all contacts, all appointments, etc).
7.

We may want to allow the user to opt-in to backing up particular data
types. For instance, backing up only contacts requires very little data.
8.

The backup app needs to keep track of progress, including what data has
been backed up and what is still pending. We expect that we will simply
keep this tracking data locally to the app.
9.

We are not planning to do any kind of snapshotting or allowing users to
revert data. We’re only allowing users to do a complete restore. Note
that because we are not doing a complete profile backup, the restore is
probably not as destructive as a complete system restore might be.
10.

Find My Device would be able to set up and force refresh of a backup
remotely. This way even if a person has never set up backup, if they lose
their phone they can change their mind, save the data, and then wipe the
device. We would have to be very clear about what data is backed up, and
what data might be missing, when a user invokes this.

The DOM API

We want to fetch and restore data via DOM APIs, without any code in other
apps that is specific to only our backup app. The design of these APIs is
important, but we will be looking to guidance and implementation from
elsewhere in Firefox OS and Platform.

The easiest way to move forward would be to use some of the DOM APIs that
already exists, such as the Contacts API or some of the Data Store APIs.
We would continue to add more data types as DOM APIs were made available.
This would happen in subsequent updates of the backup app (along with
feature detection to support older releases).

Another option would be some kind of API that delivers complete system
data. This would be a useful API for doing local backups. However I don’t
believe it would be suitable for incremental cloud backups – too much data
is opaque, and there’s too much data that may not be necessary to backup.
For instance, an app might lazily load resources and put them in IndexedDB,
but we would not want to back that data up.

At the same time, APIs that easily both export and import data are great,
and using more normalized APIs instead of ad hoc per-data-type APIs would
be helpful.

[I don’t think I’ve said everything I want to here, but I want to emphasize
that I think there are some very important design questions for the DOM
API, that affect and will be affected by the rest of the system design.]

Generic Storage

It has been suggested that we might support generic storage backends – SD
cards could be one, Dropbox another, our own service a third option.

I don’t believe this is a feasible option. We want to optimize for network
data, and it’s hard to make generic storage interfaces as efficient as a
protocol designed specifically for this. Also many of the user interface
questions get more complicated, such as selecting a service instead of
simply opting in to backup.

I am hoping instead that alternate backends can, if they want, fork our
work and set it up to use their own service and protocols, avoiding
unnecessary configuration, and moving the opt-in process to application
installation itself.

Third Party Providers

We have experimented with integrating with third party providers, but have
decided at this time that it’s not the right approach to move the project
forward at this phase. The security around storing user data is harder to
navigate with third parties, and we would like to provide a feature that
Mozilla can clearly stand behind. Other backup systems should be possible,
and we want to design for that option, but we will be implicitly (and
probably explicitly) endorsing this backup system.

Question: what if we make it easy for telcos to self-host a solution for
their customers? this would perhaps make it possible for backup/restore to
be offered as an added service, and if the telcos control the app server
and/or storage service, then they could exclude backups from the users’
metered data plans. (This assumes data is good enough for backup to be
worthwhile.)

Other possible directions

Backing up some items is very security-sensitive. For instance the history
of SMS messages. We could imagine an API where a backup system receives
only an encrypted form of this data, protected by a password (e.g., FxA
password) or a PIN. The history would then be lost if the person could not
remember their password. And changing passwords would be somewhat
complicated to implement. Maybe we should list the possible trust models
here. I can think of a few, and we’ve already gone over this ad nauseam in
discussions around New Sync, so I bet there’s a good warner-penned summary
on a wiki or mailing list somewhere:

1. trust nobody (you lose your password, it’s gone)

trust your telco (which implies you trust them and they have your keys /
can access your stuff)
2.

trust mozilla (email backup loop to reset password, we have your keys --
might be some problems in some countries due to NSA fallout)

Overall I think this doc looks great, all it is really lacking is a
high-level summary at the top of the doc. Nice work!

_______________________________________________
dev-b2g mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-b2g

[b2g] Proposal for Cloud Backup & Restore

Reply via email to