Tanya, Am 11.11.2014 um 21:36 schrieb Tanya Brokhman: > Hi Artem, > > Hope I didn't drop any ccs this time... Sorry about that. Not on purpose. > > On 11/7/2014 10:58 AM, Artem Bityutskiy wrote: >> On Thu, 2014-11-06 at 14:16 +0200, Tanya Brokhman wrote: >>> What I'm trying to say - it >>> may be too late and you may lose data here. "preferred to prevent rather >>> than cure". >> >> First of all, just to clarify, I do not have a goal of turning down your >> patches. I just want to understand why this is the best design, and if >> it is helpful to all Linux MTD users. >> >> Modern flashes have strong ECC codes protecting against many bit-flips. >> MTD even was modified to stop reporting about a single or few bit-flips, >> because those happen too often and they are "harmless", and do not >> require scrubbing. We have the threshold value in MTD for this, which is >> configurable, of course. >> >> Bit-flips develop slowly over time. If you get one more bit-flips, it is >> not too late yet. You can mitigate the "too late" part by reading more >> often of course. >> >> You also may lower the bit-flip threshold when reading for scrubbing. >> >> Could you try to "sell" your design in a way that it becomes clear why >> it is better than just reading the entire flash periodically. > > Please see my "selling" bellow :) > > Some hard >> experimental data would be preferable. > > Unfortunately none. This is done for a new device that we received just now. > The development was done on a virtual machine with nandsim. Testing was more > of stability and regression > >> >> The advantages of the "read all periodically" approach were: >> >> 1. Simple, no modifications needed >> 2. No need to write if the media is read-only, except when scrubbing >> happens. >> 3. Should cover all the NAND effects, including the "radiation" one. > > Disadvantages (as I see it): > 1. performance hit: when do you trigger the "read-all"? will effect > performance
Only a stupid implementation will re-read/scrub all PEBs at once. We can use a low priority thread. We can do this even in userspace. > 2. finds bitflips only when they are present instead of preventing them from > happening We can scrub unconditionally. Even if we scrub every PEB once a week the erase counters won't go up very much. > Perhaps our design is an overkill for this and not covering 100% of te > usecases. But it was requested by our customers to handle read-disturb and > data retention specifically (as in > "prevent" and not just "fix"). This is due to a new NAND device that should > operate in high temperature and last for ~15-20 years. > > But we did rethink this and we're dropping the "last erase timestamp" that > was used to handle "data retention". We will force-scrub all PEBs once in a > while (triggered by user) as > Richard suggested. > We're keeping the read counters though. I know that not all "read-disturb" > scenarios are covered by this but it's more coverage then we have at the > moment. So not 100% perfect > solution but better then none. > > I will update the implementation and change the fastmap layout (as suggested > by Richard earlier) or try using internal UBI volume. Still have some study > to do on that... Please don't (ab)use fastmap. If you really need persistent read-counters use an internal UBI volume. But I think that time-based unconditional scrubbing will also do it. As long we don't have sane threshold values keeping counters is useless. Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
