On Tue, Apr 6, 2010 at 7:26 PM, Earwin Burrfoot <ear...@gmail.com> wrote: >> Running out of disk space with fsync disabled won't lead to corruption. >> Even kill -9 the JRE process with fsync disabled won't corrupt. >> In these cases index just falls back to last successful commit. >> >> It's "only" power loss / OS / machine crash where you need fsync to >> avoid possible corruption (corruption may not even occur w/o fsync if >> you "get lucky"). > > Sorry to disappoint you, but running out of disk space is worse than kill -9. > You can write down the file (to cache in fact), close it, all without > getting any > exceptions. And then it won't get flushed to disk because the disk is full. > This can happen to segments file (and old one is deleted with default deletion > policy). This can happen to fat freq/prox files mentioned in segments file > (and yeah, the old segments file is deleted, so no falling back).
No, this doesn't make sense. The OS detects a disk full on accepting the write into the write cache, not [later] on flushing the write cache to disk. If the OS accepts the write, then disk is not full (ie flushing the cache will succeed, unless some other not-disk-full problem happens). Hmmm, at least, normally. What OS/IO system were you on when you saw corruption due to disk full when fsync is disabled? >> What if your background thread simply committed every couple of minutes? >> What's the difference between taking the snapshot (which means you had >> to call commit previously) and commit it, to call iw.commit by a backgroud >> merge? > -- >> But: why do you need to commit so often? > To see stuff on reopen? Yes, I know about NRT. > >> You've reinvented autocommit=true! > ?? I'm doing regular commits, syncing down every Nth of it. > >> Doesn't this just BG the syncing? Ie you could make a dedicated >> thread to do this. > > Yes, exactly, this BGs the syncing to a dedicated thread. Threads > doing indexation/merging can continue unhampered. OK. Or you can index with N+1 threads, and each indexer thread does the commit if it's time... >> One possible win with this aproach is.... the cost of fsync should go >> way down the longer you wait after writing bytes to the file and >> before calling fsync. This is because typically OS write caches >> expire by time (eg 30 seconds) so if you want long enough the bytes >> will already at least be delivered to the IO system (but the IO system >> can do further caching which could still take time). On windows at >> least I definitely noticed this effect -- wait some before fync'ing >> and it's net/net much less costly. > > Yup. In fact you can just hold on to the latest commit for N seconds, > than switch to the new latest commit. > OS will fsync everything for you. You're mixing up terminology a bit here -- you can't "hold on to the latest commit then switch to it". A commit (as sent to the deletion policy) means a *real* commit (ie, IW.commit or IW.close was called). So I think your BG thread would simply be calling IW.commit every N seconds? > I'm just playing around with stupid idea. I'd like to have NRT > look-alike without binding readers and writers. :) I see... well binding durability & visibility will always be costly. This is why Lucene decouples them (by making NRT readers available). > Right now it's probably best for me to save my time and cut over to current > NRT. > But. An important lesson was learnt - no fsyncing blows up your index > on out-of-disk-space. I'm still skeptical that disk full even with fsync disabled can lead to corruption.... I'd like to see some concrete proof :) BTW, if you know your OS/IO system always persists cached writes w/in N seconds, a safe way to avoid fsync is to use a by-time expiring deletion policy. Ie, a commit stays alive as long as its age is less than X... DP's unit test has such a policy. But you better really know for sure that the OS/IO system guarantee that :) Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org