On 12 Jun 2016 05:23:13 -0400 "George Spelvin" <li...@sciencehorizons.net> wrote:
> >> It also applies an offset of +1, to avoid negative numbers and the > >> problems of signed divides. > > > It seems to cover all cases. > > I wasn't sure why you used a signed int for the interface. No real reason other than consistency with other prototypes where page is always expressed as an integer. > > (Another thing I thought of, but am less sure of, is packing the group > and pair numbers into a register-passable int rather than a structure. > Even 2 bits for the group is probably the most that will ever be needed, > but it's easy to say the low 4 bits are the group and the high 28 are > the pair. Just create a few access macros to pull them apart. We could indeed do that, but again, do we really need to optimize things like that? > > This was inspired by Linus's hash_len abstraction, recently moved to > <linux/stringhash.h>) > > >> (or you could add an mtd->write_per_erase field). > > > Okay. Actually I'd like to avoid adding new 'conversion' fields to the > > mtd_info struct. Not sure we are really improving perfs when doing that, > > since what takes long is the I/O ops between the flash and the > > controller not the conversion operations. > > Well, yes, but you may need to do conversion ops for in-memory cache > lookups or searching for free blocks, or wear-levelling computations, > all of which may involve a great many conversions per actual I/O. That's true, even if I don't think it makes such a big difference (you don't have that much paired pages manipulation that are not followed by read/write accesses, and this is where the contention is). > > (In hindsight, I'd wish for writesize and write_per_erase, and not > store erasesize explicitly. Not only is the multiply more efficient, > but it abolishes the error of an erase size which is not a multiple of > the write size by making it impossible.) That's also true. Actually I was thinking about adding inline functions to retrieve the eraseblock and page size instead of letting people manipulate the ->writesize/erasesize fields. This way we would be able to rework the internal representation. > > > Can we have a boolean to make it clearer? > > > > bool lastpage = ((page + 1) * mtd->writesize) == mtd->erasesize; > > An improvement IMHO. You can use the same name in all four functions > to make the equivalence clear. > > > Also, the page update is quite obscure for people that did not have the > > explanation you gave above. Can we make it > > > /* > > * The first and last pages are not surrounded by other pages, > > * and are thus less sensitive to read/write disturbance. > > * That's why NAND vendors decided to use a different distance > > * for these 2 specific case, which complicates a bit the > > * pairing scheme logic. > > Um... this is, as far as I can tell, complete nonsense. Actually this was pure guessing, cause I never had a real explanation for these weird pairing scheme. > > I realize you know this about a thousand times better than I do, so > I'm hesitant to make such a strong statement, but one thing that I do > know is that paired pages are stored in the exact same transistors. > The pairing is purely a logical addressing distance. The physical > distance is exactly zero. > > The qustion is why they chose this particular *logival* addressing > scheme, and I believe the reason is write bandwidth for the common case > of streaming writes to consecutive pages. > > The obvious thing to do is pair consecutive even and odd pages (pages 0 and 1, > then 2 and 3, then...), but that makes it hard to pipeline programming of the > two pages. You can't start programming page 1 until page 0 is finished. > > The next obvious thing is stride-2: 0<->2, 1<->3, 4<->6, 5<->7, etc. Yes I understand that one. > > This is done in some MLC chips. See p. 98 of this Micron data sheet: > http://pdf.datasheet.directory/datasheets-0/micron_technology/MT29F32G08CBACAWP_C.pdf > which has a stride-4 pairing. 0..3 pair with 4..8, then 9..11 with > 12..15, and so on. > > However, it's desirable to alternate group-0 and group-1 pages, since > the write operations are rather different and even take different amounts > of time. Alternating them makes it possible to: > 1) Possibly overlap parts of the writes that use different on-chip resources, > 2) Average the non-overlapping times for minimum jitter. Okay, that's actually a good reason, and probably the part I was missing to explain these non-log2 distance scheme leading to heterogeneous distance (the first and last set of pages don't have the same stride). > > This leads naturally to the stride-3 solution. You want to minimize the > stride because you can read both pages in a pair with one read disturbance, > and the shorter the distance, the more likely you'll want both pages > (and the less buffering you'll need to make both available). > > Stride-3 does have those two awkward edge cases, and changing the > stride is simply the simplest way to special-case them. Yep. Still, I've seen weird things while working on modern MLC NANDs which makes me think the pairing scheme is also here to help mitigate the write-disturb effect, but I might be wrong. The behavior I'm describing here has been observed on Hynix (H27QCG8T2E5R‐BCF) and Toshiba (TC58TEG5DCLTA00) NANDs so far. When I write the 2 pages in a pair, but not the following page, I see a high number of bitflips in the last programmed page until the next page is programmed. Let's take a real example. My NAND is exposing a stride-3 pairing scheme, when I only program page 0, 1, 2, page 2 is showing a high number of bitflips until page 3 is programmed. Actually, I don't remember if the number decrease after programming page 3 or 4, but my guess is that the NAND is accounting for future write-disturb when programming a page in group 1, which makes this page un-reliable until the subsequent page(s) have been programmed. What's your opinion on that? > > > Thanks for your valuable review/suggestions. > > > > Just out of curiosity, why are you interested in the pairing scheme > > concept? Are you working with NANDs? > > Not at present, but I do embedded hardware and might some day. Okay. You seem pretty well aware of MLC/TLC NAND constraints, and you already have good idea of how things work. Good to have someone like you reviewing this stuff. > > Also, the data sheets are a real PITA to find. I have yet to > see an actual data sheet that documents the stride-3 pairing scheme. Yes, that's a real problem. Here is a Samsung NAND data sheet describing stride-3 [1], and an Hynix one describing stride-6 [2]. [1]http://dl.btc.pl/kamami_wa/k9gbg08u0a_ds.pdf [2]http://www.szyuda88.com/uploadfile/cfile/201061714220663.pdf -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com