Hi! I'm sorry I haven't engaged with this group previously, as has been discussions of backup and restore on several lists and with many different perspectives. I am doing engineering management on the backup and restore project within Cloud Services.
I've been talking to many people individually to collect ideas, and am working on a project plan for cloud backup and restore. We have not started implementation, so this proposal is very provisional, and can be changed with your feedback. The proposal is still unfortunately quite rough, but I really wanted to get it out this week, and I'm going to be offline for the rest of the week, so I thought it better to put it in front of you all instead of waiting longer. I have composed it in a Google Doc: https://docs.google.com/document/d/1n4LkS5fJFocIZW7gbXDWQLkUAoKwIqcVrVdq0lgKhjk/edit?usp=sharing The document is public and anyone can comment on it. (I expect to move it to the wiki later.) I will also paste it below: About This Document At some point this will become a more formal proposal, but right now this is based on the opinions and information collected by me, Ian Bicking. I will use the first person at times in this document. In Cloud Services we have not yet implemented anything in this proposal (outside of some experimental work on requestSync). We expect to present this for consideration as part of the upcoming Firefox OS roadmap planning. Some material that influenced this: “Save/Restore feature” b2g thread: https://groups.google.com/forum/#!topicsearchin/mozilla.dev.b2g/Backup/mozilla.dev.b2g/VPkqmLWP4vk “Backups for Firefox OS” b2g thread: https://groups.google.com/forum/#!topicsearchin/mozilla.dev.b2g/Backup/mozilla.dev.b2g/1OUNmQ7ouPY Third-party management app for Firefox OS: https://bugzilla.mozilla.org/show_bug.cgi?id=91942 <https://bugzilla.mozilla.org/show_bug.cgi?id=919420>0 In this document (including after it is formalized) I want to indicate the reasons for the decisions we have made, and include some discussion of directions we chose not to make. The Broadest Decision: where do we put backups? Market research has indicated that in our current target market, non-cloud backup is the most appealing option. Data plans are very limited, and in many markets Wifi is also uncommon. SD Card Backup An SD card backup option can work for nearly anyone, so long as they can afford the SD card, or potentially even buddy up with someone else (trading SD cards for backup would be a reasonable option). Because devices only support a single SD card, this option doesn’t provide backup for any data on the SD card. Photos and video in particular would not be backed up. Pros: 1. Requires no data or any other device 2. Fairly affordable 3. One SD card could support multiple people’s backups 4. By saving to different locations, it becomes easy to do snapshots, which perhaps could be used to retrieve lost data 5. Because writing to SD cards is fast, you can easily back up data that has no external structure (i.e., no structure known to the backup system itself) 6. Security is very obvious: physical possession implies access, no accounts involved (some of our target market doesn’t seem to understand password-based or account-based security schemes) Cons: 1. Probably wouldn’t support backing up anything that is already stored on the SD card, like photos 2. Requires swapping cards to around to make backups. That relatively intrusive process makes it hard to keep it up to date “on demand” 3. Only security is physical (unless we support some kind of crypto, which would be possible – but then the backup is contingent on the password, and further recovery options would be challenging, though again not impossible) 4. You either have a huge wealth of space on your destination SD card, essentially wasted, or you have not enough space on the SD card, requiring some kind of space management – there’s no pooling of storage Tethered/USB Backup Tethered backup would mean connecting your phone to your desktop/laptop computer via USB. Some application on the computer would handle or at least facilitate the backup. Work has been done on a Firefox Add-on that can do this. Yuan Xulei referenced some work Mozilla China has done for this <https://groups.google.com/d/msg/mozilla.dev.b2g/VPkqmLWP4vk/y-BPO2ZgMGMJ>. The slides <https://onedrive.live.com/view.aspx?cid=8BE02BA87AAE54D9&resid=8BE02BA87AAE54D9!107&app=Word> show a tool that can manage and edit data directly, upload music, and presumably do other functions in addition to backup. There may be other efforts around a tethered/USB experience that I’m not aware of. Pros: 1. Many users have some desktop computer available to them, and familiarity with the tethering process from other phones 2. Given hard disks, the necessary storage for backing up a phone is a pretty much a given 3. The physical security is usually excellent. Unless you are backing up to a USB drive, in which case it’s easy to imagine cached data or other unexpected issues causing a problem. 4. You can do a lot more than just back up, like managing music files. 1. Is there a situation where desktop computers have an internet connection, but wifi is not available? E.g., would people download things from their desktop computer in order to transfer them to their phone? (Like with podcasting.) 5. It’s more feasible to do time-snapshot backups than SD (given the more plentiful storage, and the assumption that the entire process is more mediated) 6. Easy to backup photos and video, and the backups are useful on the desktop computer. Cons: 1. You need some special local application to connect to the phone (except for simple backups like backing up the images, where phone-as-external-storage is sufficient) 2. Backups requiring connecting to the computer, though from there the experience can be automated fairly well 3. You need access to a desktop computer 4. Probably not reasonable for users to do on a kiosk computer (e.g., at an internet cafe) P2P Backup This is the dark horse of the options. In this model you can imagine two phones connecting to each other (over wifi or Bluetooth), with one backing up to the other. This is somewhat like the tethered use case, except you’d be wirelessly tethering to another phone. The space is still limited as in the SD case, and while unlike SD Backup you don’t get the problem with backing up SD-hosted images to another SD card, you get the space limitations that make it questionable (would my SD card have enough space for all your data? – with a separate SD card purchased specifically for your backup needs the free space is more likely to be available). Why Cloud Backup? This proposal is not an argument against local backup options. Pursuing multiple approaches to backup is very reasonable. But still: why should we pursue cloud backup even though this does not seem to match market demand? Some strategic arguments: 1. Any backup option we build now will be delivered not to our current users, but to future users. Those future users may have better access to data plans or wifi. 1. I think there are some confirmed partners that will be targeting these kinds of users. 2. Just generally: if Firefox OS is our attempt to bring a fully web experience to phones, and the web that we know and love is (mostly) a subset of the internet, then Firefox OS without data (i.e., without internet) is at most an attempt to enter a market, but can’t be the realization of our mission. (I’m sure there are people who will disagree with me – I bring this point up for discussion, but please don’t see the argument for Cloud Backup as hinging on this particular point.) 2. There’s a greater variety of approaches possible with online backup, so the platform additions we add for this feature will open up other third-party experiences. It’s hard to imagine other developers or companies pursuing USB or SD-card based approaches. (Though some of what I’ve heard about app kiosks in India may indicate I’m wrong here.) 3. By associating the data with a Firefox Account, we open possibilities that this work could be complementary to work on other platforms. For example, with Loop/Hello we are building a strong use case for sharing contacts across devices. (Note though that multi-device use cases are not part of the initial proposed work on Backup & Restore.) 1. I’m not implying we will start acquiring and using people’s data freely, but this would start the process of us storing people’s data in the cloud. Given that, if a person wants to use our other services, then we will be better able to provide them a convenient process to do so. And the more technical arguments: Pros: 1. Backups can happen silently and up-to-the-minute 2. We can also provide options to only do backup in certain situations (e.g., on wifi), or with manual intervention 3. The restore process can happen anywhere, including at the store – you don’t need to connect to a computer, or remember to bring in your SD card 4. It’s unlikely you’ll lose your backup, because it’s not physically available. It can be protected by an Firefox Account, but still available if you’ve forgotten your password (assuming you still have access to your email address to reset your account password). 5. It can be invoked remotely. If using Find My Device, we can remote trigger a backup before wiping the device. Cons: 1. Many of our users have very limited data and no access to wifi 2. Those users also want to audit their data usage very carefully – even if they might have sufficient data to do backups, they might be uncomfortable with a system that might go wrong and use up their data. All background services require trust 3. Data also affects battery life (though the frequency of our access may not register as a substantial battery drain) 4. The restore process will require a bunch of data right away! A service that only requires a small amount of data each month to incrementally back things up may seem much more substantial at restore time. (Can people do restore in the store on a provided wifi network?) 5. Backing up images and video online is too expensive for us to do it for free, which adds a whole question of paid services and clarity around cost (SD cards aren’t free, but you only pay once), longevity of data, added services (sharing, viewing), etc. For other data like contacts we can probably host the data cheaply enough that it wouldn’t even be worth setting up payments. 6. Cloud data transfer is never cheap or fast enough to use speculatively. This makes it much harder to save the data of individual applications, which may have large amounts of data that doesn’t actually need to be saved (like cached resources). Also it’s hard to do incremental backups of data without understanding the data model. Especially on an SD card you can imagine dumping the entirety of the data every time you backup, without having to be concerned about doing diffs since the last backup. 7. As a corollary to the previous point, it’s unlikely we can backup applications’ data without specific support from the applications. Dumping the entirety of IndexedDB and localStorage is probably too inefficient. 8. There’s a whole set of concepts many of our users may not understand: accounts, passwords, possible complexities if multiple devices share an account, etc. 9. Things get complicated if you migrate Firefox Accounts. With physical devices the patterns are more obvious. Look at the size of that list of cons! I’m glad we’re not scoring them. With that behind us, let’s look at the questions about how to design Cloud Backup & Restore specifically: Proposed Design Of Cloud Backup Goals Some things we want to achieve: 1. We want to be as conservative with data usage as we can 2. People should be able to forget about the backup system, and still have it reliably restore data when they need it (and they probably need it regardless of whether they expected to) 3. We should be explicit about what kinds of data we have saved, and what we have not saved 4. We will access data through DOM APIs. We want other people to be able to write services that function in their own way, and if Mozilla’s backup system uses its special (social!) access to other parts of Firefox OS then we are excluding other developers. 1. It is an open discussion what those DOM APIs should look like. We have examples like the Contacts API <https://developer.mozilla.org/en-US/docs/Web/API/Contacts_API>, which are very data-specific. The Data Store API <https://developer.mozilla.org/en-US/docs/Web/API/Data_Store_API>is more generic. The Apps Import/Export API <https://bugzilla.mozilla.org/show_bug.cgi?id=982874>is designed more specifically with these kinds of use cases. One can also imagine even more specific APIs directed at Backup/Restore use cases. 2. The work to make data available to DOM APIs is also where I hope work for Cloud Backup can overlap well with SD or USB Backup. 5. Mozilla’s backup will be primarily implemented as an application. If we need platform support that is not currently available (and we do!), then we will pursue platform additions that are available to everyone. 6. Future versions of the application will make every attempt to be backward compatible with older versions of the OS. 7. Rather than create an abstraction layer over the network protocols, we feel it’s better to encourage forking of the application. 1. Certified APIs limit the actual flexibility we can offer with these techniques. By definition backup & restore requires powerful access to personal data on the OS (including things like restoring data with original timestamps). 8. We want to be able to host this data as affordably as possible, optimized for the at-rest data storage cost. This isn’t a high-churn data storage problem like Firefox Sync has. Approach Given the goals, I propose: Backend 1. The data itself will be stored on Amazon S3. S3 is particularly affordable for its reliability. We consider S3 reliability to be sufficient without any further redundancy. (This last statement should probably be triple-checked with ops and product.) 2. Backup and restore will be controlled by a Firefox Account. 3. Devices will never connect directly to S3, instead they will go through application servers. 4. There will be one S3 resource per user/content-type. E.g., ibicking@ mozilla.com/contacts will be one resource containing all that person’s contacts. Traffic to and from S3 from the application server is free, but each request does have a cost, so if we break up backups into many resources we are likely to have higher costs. 1. Perhaps the resource should also be per device. Then if I setup my first Firefox OS device, then setup my second (restoring), then start backing up, but still keep my first device operating, the backups won’t get intermingled. OTOH usually this is not necessary. 5. Devices will send incremental updates. The application server will handle changing the S3 resource, fetching the resource, appending the data (possibly deleting data) and re-uploading. 6. Data in S3 will be encrypted. In FxA terminology we will be using Key A. This is a stable key associated with the user account, and does not change when the user changes their password. This allows a user to do a password reset just before doing a restore. 7. At some later stage we might offer other encryption options. These could include using Key B, which changes when you change your FxA password (and which we cannot derive), or perhaps encryption at some other stage. This is not part of MVP. 8. Data from the phone will be transmitted to our application servers unencrypted (except with https). Because we ultimately want to be able to decrypt the data on restore, even if the user has lost their password, we have to be able to produce the data in an unencrypted form. It does not seem like it would add security to encrypt it before delivery. 9. We’ll do everything we can to keep the amount of data transferred low, including minimizing the number of requests (aka chattiness) of the protocol. 1. Note however we are not doing protocol-first design. The protocol is designed to satisfy our goals, not for its own merit. 10. Given an FxA login, it will be possible to detect if there is backup data associated with the login. Client 1. We expect the client to be developed alongside the backend, we want to treat the end-to-end working backup as our team project. 2. The client will be a separate app. There are two motivations: 1. Keep the client code partitioned from other code, reduce the need for cross-team reviews, minimize impact on other people’s work 2. Allow the backup app to be updated on a separate schedule from the OS 3. We will need some platform support to make backup-as-an-app feasible: 1. There is an API called requestSync <https://github.com/slightlyoff/BackgroundSync/blob/master/explainer.md> that allows the backup app to ask to be woken up only when it is appropriate to do network requests (when networking is available, and per user preference perhaps only on wifi or only when manually invoked). We have done some initial work on this API, but will need help to get it in shape to be landed. 2. From a UX perspective we may not want backup to show up as an app, but instead show up as a setting. (I don’t have much opinion on this, but will look for guidance from UX.) 1. The actual “UI” of the backup app will consist of settings, error messages and notifications, and perhaps status. This is why it isn’t a very good app. 3. We might want to expedite the opt-in flow in FTE or as part of FxA signup. 4. When you first sign in to an FxA account, we would like to check for a backup and offer to restore at that time. 4. We expect integrate via DOM APIs. This is a big topic and gets its own section. 5. We also expect to restore data via DOM APIs. This often requires more permission than merely fetching the data, especially to restore with timestamps or other original metadata. 6. We are not expecting to do a complete system backup, e.g., back up the entirety of the profile. Instead we expect to do data-specific backups (backing up all contacts, all appointments, etc). 7. We may want to allow the user to opt-in to backing up particular data types. For instance, backing up only contacts requires very little data. 8. The backup app needs to keep track of progress, including what data has been backed up and what is still pending. We expect that we will simply keep this tracking data locally to the app. 9. We are not planning to do any kind of snapshotting or allowing users to revert data. We’re only allowing users to do a complete restore. Note that because we are not doing a complete profile backup, the restore is probably not as destructive as a complete system restore might be. 10. Find My Device would be able to set up and force refresh of a backup remotely. This way even if a person has never set up backup, if they lose their phone they can change their mind, save the data, and then wipe the device. We would have to be very clear about what data is backed up, and what data might be missing, when a user invokes this. The DOM API We want to fetch and restore data via DOM APIs, without any code in other apps that is specific to only our backup app. The design of these APIs is important, but we will be looking to guidance and implementation from elsewhere in Firefox OS and Platform. The easiest way to move forward would be to use some of the DOM APIs that already exists, such as the Contacts API or some of the Data Store APIs. We would continue to add more data types as DOM APIs were made available. This would happen in subsequent updates of the backup app (along with feature detection to support older releases). Another option would be some kind of API that delivers complete system data. This would be a useful API for doing local backups. However I don’t believe it would be suitable for incremental cloud backups – too much data is opaque, and there’s too much data that may not be necessary to backup. For instance, an app might lazily load resources and put them in IndexedDB, but we would not want to back that data up. At the same time, APIs that easily both export and import data are great, and using more normalized APIs instead of ad hoc per-data-type APIs would be helpful. [I don’t think I’ve said everything I want to here, but I want to emphasize that I think there are some very important design questions for the DOM API, that affect and will be affected by the rest of the system design.] Generic Storage It has been suggested that we might support generic storage backends – SD cards could be one, Dropbox another, our own service a third option. I don’t believe this is a feasible option. We want to optimize for network data, and it’s hard to make generic storage interfaces as efficient as a protocol designed specifically for this. Also many of the user interface questions get more complicated, such as selecting a service instead of simply opting in to backup. I am hoping instead that alternate backends can, if they want, fork our work and set it up to use their own service and protocols, avoiding unnecessary configuration, and moving the opt-in process to application installation itself. Third Party Providers We have experimented with integrating with third party providers, but have decided at this time that it’s not the right approach to move the project forward at this phase. The security around storing user data is harder to navigate with third parties, and we would like to provide a feature that Mozilla can clearly stand behind. Other backup systems should be possible, and we want to design for that option, but we will be implicitly (and probably explicitly) endorsing this backup system. Question: what if we make it easy for telcos to self-host a solution for their customers? this would perhaps make it possible for backup/restore to be offered as an added service, and if the telcos control the app server and/or storage service, then they could exclude backups from the users’ metered data plans. (This assumes data is good enough for backup to be worthwhile.) Other possible directions Backing up some items is very security-sensitive. For instance the history of SMS messages. We could imagine an API where a backup system receives only an encrypted form of this data, protected by a password (e.g., FxA password) or a PIN. The history would then be lost if the person could not remember their password. And changing passwords would be somewhat complicated to implement. Maybe we should list the possible trust models here. I can think of a few, and we’ve already gone over this ad nauseam in discussions around New Sync, so I bet there’s a good warner-penned summary on a wiki or mailing list somewhere: 1. trust nobody (you lose your password, it’s gone) 1. trust your telco (which implies you trust them and they have your keys / can access your stuff) 2. trust mozilla (email backup loop to reset password, we have your keys -- might be some problems in some countries due to NSA fallout) Overall I think this doc looks great, all it is really lacking is a high-level summary at the top of the doc. Nice work!
_______________________________________________ dev-b2g mailing list [email protected] https://lists.mozilla.org/listinfo/dev-b2g
