On Jan 5, 2009, at 3:04 PM, Damien Katz wrote:


On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:


On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:


If necessary and possible, we'll patch the Erlang VM.

That seems like a bad idea to me - I'd think you'd want to stay out of the VM business.

No, I mean send patches to the maintainers of Erlang to fix any problems on their supported platforms. Just like the F_FULLFSYNC patch.

Ah.  Whew :)



But if a platform doesn't support proper flushing, then it's not a platform that can support an ACID database.

We're not communicating well here.

"proper flushing" depends on what you want to do - if you need your data to in confirmed permanent storage so that it can survive a crash or power cut, then w/o special configuration (e.g. battery- backed RAID, for example), I don't think that you're going to get assurance on linux.

Do you see what I'm saying?


Yes I see what you are saying. Can you show that Linux doesn't actually safely push the bits to disk in popular distros? If that's the case, then we need to find the APIs that actually work and call them, and if they don't work, we don't support Linux.

It pushes the bits to the disk drive, but that's where it's sphere of effect ends - what the drive does after that is drive specific. Drives cache writes to aggregate, or write things out of order based on head location, etc.

This isn't something that only affects Couch.

So I would say that .... it's time to relax.

Take the approach that you have a few modes

a) fsync() mode - for people that care about true durability, it's up to them to get or configure drives to behave right, or whatever b) the delayed write mode so that you can do things like aggregate writes into clocked fsyncs or something (I'd use this - I'll take the performance trade for durability) c) and for platforms that offer special modes that really do guarantee the write all the way to the physical media, like OS X's fcntl(F_FULLSYNC), make that an option too.


why not make it a config option, so that the db admin can choose the durability level in general, and let clients that know they are talking to couch override w/ a header?


Definitely, I think commit options should be settable per- database. But for now I was just wanting to address the slowdown, especially for replication and the tests, to keep everyone productive. More commit features and options is lower priority work for now, I was just addresses the most serious slowdown.

That makes sense, but IMO you papered over the root problem.
It's good to keep people working, but I think the issue deserves a look. I don't know erlang, or I would look myself.

What issue? Why do you think this is Erlang specific?

Oh - this is a SWAG based on one data point :) [it was a rough day - I didn't get to try to duplicate the results found yesterday...]

http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%[email protected]%3e

It was reported that w/ the same up-to-date version of erlang, they found a big performance difference between 0.8 and current trunk. If that's true, then it seems to me that something changed in the filesystem handling in the CouchDB code itself - it could be that there are multiple flush modes, and the 0.8 code used whatever corresponds to fsync(), and trunk uses whatever corresponds to fnctl(F_FULLSYNC). I don't know It's a guess. But yesterdays results are unexplained, and I hate mysteries.

I can't help with the erlang (I don't know it...), but I can at least try to reproduce the results...

geir




Reply via email to