On Jan 5, 2009, at 3:04 PM, Damien Katz wrote:
On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:
On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
If necessary and possible, we'll patch the Erlang VM.
That seems like a bad idea to me - I'd think you'd want to stay out
of the VM business.
No, I mean send patches to the maintainers of Erlang to fix any
problems on their supported platforms. Just like the F_FULLFSYNC
patch.
Ah. Whew :)
But if a platform doesn't support proper flushing, then it's not a
platform that can support an ACID database.
We're not communicating well here.
"proper flushing" depends on what you want to do - if you need your
data to in confirmed permanent storage so that it can survive a
crash or power cut, then w/o special configuration (e.g. battery-
backed RAID, for example), I don't think that you're going to get
assurance on linux.
Do you see what I'm saying?
Yes I see what you are saying. Can you show that Linux doesn't
actually safely push the bits to disk in popular distros? If that's
the case, then we need to find the APIs that actually work and call
them, and if they don't work, we don't support Linux.
It pushes the bits to the disk drive, but that's where it's sphere of
effect ends - what the drive does after that is drive specific.
Drives cache writes to aggregate, or write things out of order based
on head location, etc.
This isn't something that only affects Couch.
So I would say that .... it's time to relax.
Take the approach that you have a few modes
a) fsync() mode - for people that care about true durability, it's
up to them to get or configure drives to behave right, or whatever
b) the delayed write mode so that you can do things like aggregate
writes into clocked fsyncs or something (I'd use this - I'll take the
performance trade for durability)
c) and for platforms that offer special modes that really do
guarantee the write all the way to the physical media, like OS X's
fcntl(F_FULLSYNC), make that an option too.
why not make it a config option, so that the db admin can choose
the durability level in general, and let clients that know they
are talking to couch override w/ a header?
Definitely, I think commit options should be settable per-
database. But for now I was just wanting to address the slowdown,
especially for replication and the tests, to keep everyone
productive. More commit features and options is lower priority
work for now, I was just addresses the most serious slowdown.
That makes sense, but IMO you papered over the root problem.
It's good to keep people working, but I think the issue deserves a
look. I don't know erlang, or I would look myself.
What issue? Why do you think this is Erlang specific?
Oh - this is a SWAG based on one data point :) [it was a rough day -
I didn't get to try to duplicate the results found yesterday...]
http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%[email protected]%3e
It was reported that w/ the same up-to-date version of erlang, they
found a big performance difference between 0.8 and current trunk. If
that's true, then it seems to me that something changed in the
filesystem handling in the CouchDB code itself - it could be that
there are multiple flush modes, and the 0.8 code used whatever
corresponds to fsync(), and trunk uses whatever corresponds to
fnctl(F_FULLSYNC). I don't know It's a guess. But yesterdays
results are unexplained, and I hate mysteries.
I can't help with the erlang (I don't know it...), but I can at least
try to reproduce the results...
geir