On Tue, Apr 6, 2010 at 10:11 AM, Earwin Burrfoot <ear...@gmail.com> wrote:
> So, I want to pump my IndexWriter hard and fast with documents. Nice. > Removing fsync from FSDirectory helps. But for that I pay with possibility of > index corruption, not only if my node suddenly loses > power/kernelpanics, but also if it > runs out of disk space (which happens more frequently). Running out of disk space with fsync disabled won't lead to corruption. Even kill -9 the JRE process with fsync disabled won't corrupt. In these cases index just falls back to last successful commit. It's "only" power loss / OS / machine crash where you need fsync to avoid possible corruption (corruption may not even occur w/o fsync if you "get lucky"). But: why do you need to commit so often? > I invented the following solution: > We write a special deletion policy that resembles SnapshotDeletionPolicy. > At all times it takes hold of "current synced commit" and preserves > it. Once every N minutes > a special thread takes latest commit, syncs it and nominates as > "current synced commit". The > previous one gets deleted. > > Now we are disastery-proof, and do fsync asynchronously from indexing > threads. We pay for this with > somewhat bigger transient disc usage, and probably losing a few > minutes worth of updates in > case of a crash, but that's acceptable. > > How does this sound? You've reinvented autocommit=true! Doesn't this just BG the syncing? Ie you could make a dedicated thread to do this. One possible win with this aproach is.... the cost of fsync should go way down the longer you wait after writing bytes to the file and before calling fsync. This is because typically OS write caches expire by time (eg 30 seconds) so if you want long enough the bytes will already at least be delivered to the IO system (but the IO system can do further caching which could still take time). On windows at least I definitely noticed this effect -- wait some before fync'ing and it's net/net much less costly. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org