On Wed, Dec 18, 2013 at 11:34 PM, Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: >> You could make a custom Dir wrapper that always caches in RAM, but >> that sounds a bit terrifying :) > > This was exactly what I implemented:)
I see :) > A commit-thread runs periodically > every 30 seconds, while RAM-Monitor thread runs every 5 seconds to commit > data in-case sizeInBytes>=70%-of-maxCachedBytes. This is quite dangerous as > you have said, especially when sync() can take an arbitrary amount of time Well, Lucene is able to produce bytes at a high rate (if it can read them at a high rate), during merging, so if you're not careful you can use too much RAM. Be sure to stall the byte-producing-threads when that happens, until HDFS catches up. >> Alternatively, maybe on an HDFS error you could block that one thread >> while you retry for some amount of time, until the write/read >> succeeds? (Like an NFS hard mount). > > Well, after your idea I started digging HDFS for this problem. I believe > HDFS handles this internally without a snitch, as per this link. > https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-3/data-flow > > I believe in the case of a node failure while writing, even an IOException > is also not thrown to the client and all of it is handled internally. I > think I can rest-easy on this. > May be will write a test-case to verify this behavior. Oh that's good. > Sorry for the trouble. Should have done some digging before-hand. No problem! Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org