>> Does this sound reasonable?
>
>Does to me. Great example!
Thanks for the flowers :)
However, I am sure, the raid developers have thought through
all this over and over, and still have some asses in their hands.
I'd like to hear from them about the event count in the superblock
Peter mentioned, and the algorithm, that decides, which blocks still
needs to be synced.
As Luca wrote:
> there isn't one [non-volatile storage about blocks needing sync] for
> lack of a non-volatile storage for dirty cache
but probably Neil knows a bit more about that?
Probably, to be on the save side, one would have to perform
real HD internal write cache flushes after each
- write of start-of-transaction-info
- write of data
- write of end-of-transaction-info
I think, this is necessary, because otherwise the HD write cache
flush might start with a write, that came in later, so it might
first write the end-of-transaction-info, then the data, and then
the start-of-transaction-info. A chrash in between would
smash everything.
Actually this should be a problem for journaling fs writers in the
first place, but as raid subsystems in between do some caching on
there own in a very special way, it becomes a topic for raid designers
too. What do I mean with "very special way". I mean, that they write,
and then say, that they have written o.k. And if you read back the
written data (after a crash in between), you may by chance (=by having the
faster HD choosen for read) find everything fine, even if it actually
did write to one of the HDs only.
I still believe, that things would be better, if reads would go to both HDs,
and compare the results. Even if a difference would not be solvable for data
(and so would not improve that situation), it would improve the situation for
reading transaction-info:
difference in start-of-transaction-info
-> the data write has not started jet, so just
delete the start-of-transaction-info
difference in end-of-transaction-info
-> The data write has finished already, so just
update the end-of-transaction-info
difference in data
-> can not happen,because the jfs would have rolled back
at boot after crash
Thomas
PS:
>Do you see any problem in this [more complex 4 HD] scenario?
It looks like the easier example is still not clarified, so we stay with
that one for now :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html