On Mon, Jan 9, 2017 at 3:33 PM, Willem Jan Withagen <[email protected]> wrote:
> On 9-1-2017 23:58, Brian Andrus wrote: > > Sorry for spam... I meant D_SYNC. > > That term does not run any lights in Google... > So I would expect it has to O_DSYNC. > (https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to- > test-if-your-ssd-is-suitable-as-a-journal-device/) > > Now you tell me there is a SSDs that does take correct action with > O_SYNC but not with O_DSYNC... That makes no sense to me. It is a > typical solution in the OS as speed trade-off versus a bit less > consistent FS. > > Either a device actually writes its data persistenly (either in silicon > cells, or keeps it in RAM with a supercapacitor), or it does not. > Something else I can not think off. Maybe my EE background is sort of in > the way here. And I know that is rather hard to write correct SSD > firmware, I seen lots of firmware upgrades to actually fix serious > corner cases. > > Now the second thing is how hard does a drive lie when being told that > the request write is synchronised. And Oke is only returned when data is > in stable storage, and can not be lost. > > If there is a possibility that a sync write to a drive is not > persistent, then that is a serious breach of the sync write contract. > There will always be situations possible that these drives will lose data. > And if data is no longer in the journal, because the writing process > thinks the data is on stable storage it has deleted the data from the > journal. In this case that data is permanently lost. > > Now you have a second chance (even a third) with Ceph, because data is > stored multiple times. And you can go to another OSD and try to get it > back. > > --WjW > I'm not disagreeing per se. I think the main point I'm trying to address is - as long as the backing OSD isn't egregiously handling large amounts of writes and it has a good journal in front of it (that properly handles O_DSYNC [not D_SYNC as Sebastien's article states]), it is unlikely inconsistencies will occur upon a crash and subsequent restart. Therefore - while not ideal to rely on journals to maintain consistency, that is what they are there for. There is a situation where "consumer-grade" SSDs could be used as OSDs. While not ideal, it can and has been done before, and may be preferable to tossing out $500k of SSDs (Seen it firsthand!) > > > > > On Mon, Jan 9, 2017 at 2:56 PM, Brian Andrus <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi Willem, the SSDs are probably fine for backing OSDs, it's the > > O_DSYNC writes they tend to lie about. > > > > They may have a failure rate higher than enterprise-grade SSDs, but > > are otherwise suitable for use as OSDs if journals are placed > elsewhere. > > > > On Mon, Jan 9, 2017 at 2:39 PM, Willem Jan Withagen <[email protected] > > <mailto:[email protected]>> wrote: > > > > On 9-1-2017 18:46, Oliver Humpage wrote: > > > > > >> Why would you still be using journals when running fully OSDs > on > > >> SSDs? > > > > > > In our case, we use cheaper large SSDs for the data (Samsung > 850 Pro > > > 2TB), whose performance is excellent in the cluster, but as > has been > > > pointed out in this thread can lose data if power is suddenly > > > removed. > > > > > > We therefore put journals onto SM863 SSDs (1 journal SSD per 3 > OSD > > > SSDs), which are enterprise quality and have power outage > protection. > > > This seems to balance speed, capacity, reliability and budget > fairly > > > well. > > > > This would make me feel very uncomfortable..... > > > > So you have a reliable journal, so upto there thing do work: > > Once in the journal you data is safe. > > > > But then you async transfer the data to disk. And that is an SSD > > that > > lies to you? It will tell you that the data is written. But if > > you pull > > the power, then it turns out that the data is not really stored. > > > > And then the only way to get the data consistent again, is to > > (deep)scrub. > > > > Not a very appealing lookout?? > > > > --WjW > > > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] <mailto:[email protected]> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > > > > > > -- > > Brian Andrus > > Cloud Systems Engineer > > DreamHost, LLC > > > > > > > > > > -- > > Brian Andrus > > Cloud Systems Engineer > > DreamHost, LLC > > -- Brian Andrus Cloud Systems Engineer DreamHost, LLC
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
