NAND Full Issue Attending: Greg, Michael, Joe, Erik, Charlie, Chris, Kim
We discussed the problem recently uncovered in Uruguay, the solutions and suggestions that have been posted and came up with a proposal for moving forward. Problem statement: bug 7587 With build 656, when the file system is completely full the laptop will not boot. Currently Uruguay has to ship the laptops back to repair centers causing both shipping costs and downtime costs. Please see below for the 5 types of avoidance, recovery, build image solutions and the bug fix solution. (Most of these have been discussed in some detail on other threads). This proposal addresses the problem for Uruguay a little differently than for other build 656 customers as Uruguay is already diverged from our code base. They may want a more elaborate solution that they can test and deploy at their own pace. OLPC's response is "Failsafe" for 656, per703, and 8.1.2; and a formal bug fix for 8.2 going forward: "Failsafe" OS - includes the "Automatic Free Space" recovery in a build image. This works for laptops that are already refusing to boot as well as for preventing the non-boot problem. On boot up, this will check for free NAND and if there isn't enough to boot, it will display a message that it is deleting files, and it will remove the largest file(s) until 50M is available and then finish booting. This can be delivered on a USB stick. Each country technical liaison can decide if they want to update all laptops, or wait until laptops see the problem (which could be many months). It should also be incorporated in Peru's build (703), which we need to deliver early next week, so we can avoid the problem for 100k laptops. The formal bug fix with better notifications and the ability for the user to chose what to delete will be described in 7587; and will be delivered in 8.2. Uruguay: Erik is working with Uruguay on the solution described as "Union Mount" below. It is important that Uruguay own this bug fix themselves and can maintain it as needed, test it to their satisfaction, decide how to distribute it. This can be delivered as a USB or wireless download. Uruguay also has the choice to use the options supported by OLPC above. Thoughts? Kim ------- AVOIDANCE: If the students /teachers had a regime of deleting files, that might avoid the problem. In Uruguay they are capable of displaying a dialog box at 85% full; use that to avoid the problem. RECOVERY SOLUTIONS - Reflash the build via local USB stick - today this is not possible because of their activation system. Automatic Free Space: Provide USB bootable build that would free space in some way. Can we identify a class of things that we know can be deleted (like cracklib dictionary of unsafe passwords, large activities). Add a note that a delete is going to happen during boot. BUILD SOLUTIONS - Union Mount: Erik's 'union mounting' (UFS) - check at boot if you are above threshold. If so, mount the root as readonly and redirect write requests. Nothing would write permanently. You can mark things for delete, which will get deleted at the next shutdown. This can be deployed ahead of time. Failsafe: Can be inserted in the build, include 'automatic free space'. It opens the datastore and sorts by size, wants to find 50M, pops off the stack deleting stuff from largest to smallest. Can it explain afterwards what it has done or explain ahead of time what it might do. Provide options for what to delete. Big File: At reboot, a big file is written and saves space for the case when you can't boot. Seems like it isn't a great idea. Two step boot process - every boot we check that there is a file of a good size; still should have a GUI for deciding what to delete. The Fix: (fix to 7587) When the NAND is full, Sugar will boot but not be allowed to write. A notification about space and inability to write needs to be displayed. _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
