Sorry for bombarding the mailing-list... I've also just now found out about this JIRA https://issues.apache.org/jira/browse/BLUR-433
Think it's really useful for our case too. Many thanks for this neat patch On Mon, Jul 18, 2016 at 2:39 PM, Ravikumar Govindarajan < [email protected]> wrote: > We have also made a patch for having a high-water-mark level (15% of > excess block-cache capacity) after which cache-writes are stopped. > > Once capacity is reclaimed via clean-up thread, we resume adding to cache > > On Mon, Jul 18, 2016 at 1:58 PM, Ravikumar Govindarajan < > [email protected]> wrote: > >> We had an issue with block-cache growing beyond configured size & >> reducing very rarely. Describing the sequence of events >> >> 1. Shard receives incoming mutations, adds it to Index & triggers >> background merge. >> 2. Merge produces new-set of files. We have write-thru cache enabled >> & adds new files to block-cache.. >> 3. Shard goes silent & doesn't receive any mutation for many minutes >> all together >> 4. Since we perform commit only upon receiving mutations, the >> older-files are not evicted from block-cache.. >> 5. Problem is exacerbated with KeepNLastCommit policy, where even >> after commit, unused files are not evicted from block-cache.. >> >> >> We are planning to patch up SharedMergeScheduler by refreshing >> IndexReader when a merge completes & then delete merged files from >> block-cache. This way, I believe block-cache can be reigned in whenever it >> exceeds capacity, irrespective of Commit-Policy used >> >> Do let know if this is fine... >> >> On Thu, Jun 16, 2016 at 4:33 PM, Ravikumar Govindarajan < >> [email protected]> wrote: >> >>> I didn't fully understand the underlying Lucene reader, writer, >>>> open, close semantics >>> >>> >>> I too don't know the correct behavior. Lucene code is incredibly hairy >>> to follow... :) >>> >>> Have pinged lucene mailing list. Hope someone replies... >>> >>> On Tue, Jun 7, 2016 at 4:46 PM, Aaron McCurry <[email protected]> >>> wrote: >>> >>>> On Wed, Jun 1, 2016 at 7:34 AM, Ravikumar Govindarajan < >>>> [email protected]> wrote: >>>> >>>> > Just one more observation here... >>>> > >>>> > Even if readerPooling is set to true, lucene has 2 readers (One for >>>> search >>>> > & one updates/deletes) >>>> > >>>> > But the reader for updates/deletes is not opened/closed for every >>>> commit >>>> > call which is the default behavior as of today. It is opened only once >>>> > (During first update/delete call) >>>> > >>>> >>>> I will take a closer look at the code for this one. Likely when I wrote >>>> this code I didn't fully understand the underlying Lucene reader, >>>> writer, >>>> open, close semantics. Thank you for pointing this out! >>>> >>>> Aaron >>>> >>>> >>>> > >>>> > On Wed, Jun 1, 2016 at 3:10 PM, Ravikumar Govindarajan < >>>> > [email protected]> wrote: >>>> > >>>> > > In newer versions of the code there are multiple streams involved. >>>> One >>>> > for >>>> > >> each open file handle plus if a sequential read is detected a new >>>> stream >>>> > >> is >>>> > >> created for the instance for better performance >>>> > > >>>> > > >>>> > > Great. We just patched up our Blur version with this code. >>>> > > >>>> > > While I was digging at the reader-closed issue, was quite surprised >>>> to >>>> > > observe the following behavior >>>> > > >>>> > > - Issue a commit >>>> > > - Lucene opens a new reader via IndexWriter. (Doesn't re-use our >>>> > > already opened DirectoryReader) >>>> > > - Processes all updates/deletes/merges >>>> > > - Closes the new reader >>>> > > - Complete commit >>>> > > >>>> > > For a big index & lots of commits, opening a new-reader for every >>>> commit >>>> > > is prohibitively expensive. >>>> > > >>>> > > >>>> > > Here is the JIRA for it... >>>> > > https://issues.apache.org/jira/browse/LUCENE-2297 >>>> > > >>>> > > All we need to do is just set "readerPooling=true" in >>>> IndexWriterConfig >>>> > > class >>>> > > >>>> > > Please do explore this option when you find time. >>>> > > >>>> > > -- >>>> > > Ravi >>>> > > >>>> > > >>>> > > >>>> > > On Tue, May 24, 2016 at 7:48 PM, Aaron McCurry <[email protected]> >>>> > wrote: >>>> > > >>>> > >> On Tue, May 24, 2016 at 6:06 AM, Ravikumar Govindarajan < >>>> > >> [email protected]> wrote: >>>> > >> >>>> > >> > We have solved it temporarily by using a KeepLastTwoCommits del >>>> > policy. >>>> > >> We >>>> > >> > don't get these exceptions now!!! >>>> > >> > >>>> > >> >>>> > >> Great! >>>> > >> >>>> > >> >>>> > >> > >>>> > >> > Btw, I see that pread calls in FSDataInputStream.java are >>>> > synchronized. >>>> > >> Is >>>> > >> > it possible that merge DFS read calls could potentially block >>>> search >>>> > DFS >>>> > >> > read calls? >>>> > >> > >>>> > >> >>>> > >> Yes. >>>> > >> >>>> > >> >>>> > >> > >>>> > >> > Would it be a good idea to have 2 DFSInputStreams for every >>>> file, one >>>> > >> for >>>> > >> > merge & another for search? >>>> > >> > >>>> > >> >>>> > >> In newer versions of the code there are multiple streams >>>> involved. One >>>> > >> for >>>> > >> each open file handle plus if a sequential read is detected a new >>>> stream >>>> > >> is >>>> > >> created for the instance for better performance. Checkout the >>>> > >> HdfsDirectory class. >>>> > >> >>>> > >> Aaron >>>> > >> >>>> > >> >>>> > >> > >>>> > >> > On Tue, May 10, 2016 at 7:43 PM, Ravikumar Govindarajan < >>>> > >> > [email protected]> wrote: >>>> > >> > >>>> > >> > > Sorry, I mis-understood the code. >>>> > >> > > I see that it has 2 locks IndexRefreshWriteLock & >>>> > >> IndexRefreshReadLock. >>>> > >> > > They look to be separate >>>> > >> > > >>>> > >> > > On Tue, May 10, 2016 at 7:16 PM, Ravikumar Govindarajan < >>>> > >> > > [email protected]> wrote: >>>> > >> > > >>>> > >> > >> Thanks a lot Aaron. >>>> > >> > >> >>>> > >> > >> I guess we took a commit of 0.2.2 that doesn't have the >>>> > >> > >> IndexRefreshWriteLock (IRWL). It looks like it co-ordinates >>>> between >>>> > >> > >> searches & incoming mutation commits. If so, then it will >>>> likely >>>> > >> solve >>>> > >> > the >>>> > >> > >> first issue for us (AlreadyClosedException) >>>> > >> > >> >>>> > >> > >> >>>> > >> > >> Can you recollect if that was the reason IRWL was introduced? >>>> > >> > >> >>>> > >> > >> On Tue, May 10, 2016 at 6:40 PM, Aaron McCurry < >>>> [email protected] >>>> > > >>>> > >> > >> wrote: >>>> > >> > >> >>>> > >> > >>> On Tue, May 10, 2016 at 2:30 AM, Ravikumar Govindarajan < >>>> > >> > >>> [email protected]> wrote: >>>> > >> > >>> >>>> > >> > >>> > Actually there are 2 issues... >>>> > >> > >>> > >>>> > >> > >>> > 1. IndexReaderClosedException >>>> > >> > >>> > 2. HDFS Stream Closed >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>> Likely when the index is closed it closes the underlying >>>> > >> indexinputs as >>>> > >> > >>> well causing the HDFS Stream closed exception. >>>> > >> > >>> >>>> > >> > >>> >>>> > >> > >>> > >>>> > >> > >>> > Merge completion results in File Deletion & ultimately HDFS >>>> > Stream >>>> > >> > >>> Closed >>>> > >> > >>> > during Search.... >>>> > >> > >>> > >>>> > >> > >>> > I use IndexFileDeleter with >>>> KeepOnlyLastCommitDeletionPolicy. >>>> > This >>>> > >> > >>> blindly >>>> > >> > >>> > deletes the file, without bothering to cross-check >>>> > >> > >>> IndexReader.RefCount > >>>> > >> > >>> > 0. >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>> Hmm. You can see here: >>>> > >> > >>> >>>> > >> > >>> >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> https://github.com/apache/incubator-blur/blob/release-0.2.2-incubating/blur-core/src/main/java/org/apache/blur/manager/writer/BlurIndexSimpleWriter.java#L303 >>>> > >> > >>> >>>> > >> > >>> That once the new index is available it is swapped into the >>>> index >>>> > >> ref >>>> > >> > >>> object and the old one is sent to the index closer. Once >>>> the ref >>>> > to >>>> > >> > the >>>> > >> > >>> index are low enough it closes the index. Or at least it >>>> should. >>>> > >> > >>> >>>> > >> > >>> I will continue looking into the problem but I don't have a >>>> > solution >>>> > >> > for >>>> > >> > >>> you yet. >>>> > >> > >>> >>>> > >> > >>> Aaron >>>> > >> > >>> >>>> > >> > >>> >>>> > >> > >>> >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> > *Exception(message:Unknown error during rewrite, >>>> > >> > >>> > stackTraceStr:java.io.IOException: Stream closed* >>>> > >> > >>> > at >>>> > >> > >>> >>>> > >> >>>> org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1385) >>>> > >> > >>> > at >>>> > >> > >>>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1374) >>>> > >> > >>> > at >>>> > >> > >>> >>>> > >> >>>> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.hdfs.HdfsIndexInput.readInternal(HdfsIndexInput.java:62) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.buffer.ReusedBufferedIndexInput.readBytes(ReusedBufferedIndexInput.java:167) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.buffer.ReusedBufferedIndexInput.readBytes(ReusedBufferedIndexInput.java:122) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.hdfs.MmapCacheIndexInput.readAndcache(MmapCacheIndexInput.java:24) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.blockcache_v2.CacheIndexInput.fillNormally(CacheIndexInput.java:354) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.blockcache_v2.CacheIndexInput.fill(CacheIndexInput.java:379) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.blockcache_v2.CacheIndexInput.tryToFill(CacheIndexInput.java:297) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.store.blockcache_v2.CacheIndexInput.readByte(CacheIndexInput.java:151) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.lucene.warmup.TraceableIndexInput.readByte(TraceableIndexInput.java:62) >>>> > >> > >>> > at >>>> > org.apache.lucene.store.DataInput.readVInt(DataInput.java:108) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock(BlockTreeTermsReader.java:2366) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekCeil(BlockTreeTermsReader.java:1949) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.index.ExitableReader$ExitableTermsEnum.seekCeil(ExitableReader.java:250) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:225) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:78) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.search.ConstantScoreAutoRewrite.rewrite(ConstantScoreAutoRewrite.java:95) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.search.MultiTermQuery$ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:220) >>>> > >> > >>> > at >>>> > >> > >>> >>>> > >> > >>>> > >>>> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) >>>> > >> > >>> > at >>>> > >> > >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) >>>> > >> > >>> > at >>>> > >> > >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) >>>> > >> > >>> > at >>>> > >> > >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) >>>> > >> > >>> > at >>>> > >> > >>> > >>>> > >> > >>> > On Mon, May 9, 2016 at 4:42 PM, Ravikumar Govindarajan < >>>> > >> > >>> > [email protected]> wrote: >>>> > >> > >>> > >>>> > >> > >>> > > One extra info we gleaned from the logs... >>>> > >> > >>> > > >>>> > >> > >>> > > 1. Merge Starts & is about to complete >>>> > >> > >>> > > 2. Searcher is opened >>>> > >> > >>> > > 3. Merge Completes >>>> > >> > >>> > > 4. Ref-count drops to 0 in IndexReader >>>> > >> > >>> > > 5. IndexReader closed while Searcher is still open >>>> > >> > >>> > > >>>> > >> > >>> > > This seems to be the main pattern for causing the >>>> Exception >>>> > >> > >>> > > >>>> > >> > >>> > > -- >>>> > >> > >>> > > Ravi >>>> > >> > >>> > > >>>> > >> > >>> > > On Mon, May 9, 2016 at 3:08 PM, Ravikumar Govindarajan < >>>> > >> > >>> > > [email protected]> wrote: >>>> > >> > >>> > > >>>> > >> > >>> > >> Thanks Aaron... >>>> > >> > >>> > >> >>>> > >> > >>> > >> Just a quick question. Lucene itself has ref-counting to >>>> > close >>>> > >> > it's >>>> > >> > >>> > >> readers no? Or Blur has it's own logic to handle it? >>>> > >> > >>> > >> >>>> > >> > >>> > >> -- >>>> > >> > >>> > >> Ravi >>>> > >> > >>> > >> >>>> > >> > >>> > >> On Fri, May 6, 2016 at 7:56 PM, Aaron McCurry < >>>> > >> [email protected] >>>> > >> > > >>>> > >> > >>> > wrote: >>>> > >> > >>> > >> >>>> > >> > >>> > >>> Likely yes. If have a few minutes this weekend I can >>>> look >>>> > >> > through >>>> > >> > >>> that >>>> > >> > >>> > >>> version and see if I can point you in the right >>>> direction. >>>> > >> > >>> > >>> >>>> > >> > >>> > >>> On Fri, May 6, 2016 at 8:46 AM, Ravikumar Govindarajan >>>> < >>>> > >> > >>> > >>> [email protected]> wrote: >>>> > >> > >>> > >>> >>>> > >> > >>> > >>> > Sometimes during an ongoing search we receive an >>>> > >> > >>> > >>> > IndexReaderClosedException... >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > We are on an older version of Blur (0.2.2). Has this >>>> been >>>> > >> fixed >>>> > >> > >>> in >>>> > >> > >>> > >>> newer >>>> > >> > >>> > >>> > versions or we have been using it wrongly? >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > >>>> > >> *stackTraceStr:org.apache.lucene.store.AlreadyClosedException: >>>> > >> > >>> this >>>> > >> > >>> > >>> > IndexReader cannot be used anymore as one of its >>>> child >>>> > >> readers >>>> > >> > >>> was >>>> > >> > >>> > >>> closed* >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>>> > >> >>>> org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:257) >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.index.FilterAtomicReader.fields(FilterAtomicReader.java:380) >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.blur.index.ExitableReader$ExitableFilterAtomicReader.fields(ExitableReader.java:81) >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:52) >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.search.ConstantScoreAutoRewrite.rewrite(ConstantScoreAutoRewrite.java:95) >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >> >>>> > >>>> org.apache.lucene.search.MultiTermQuery$ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:220) >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>> >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >>>> > >>>> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) >>>> > >> > >>> > >>> > at >>>> > >> > >>> > >>>> > >> >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> >>>> > >> > >>> > >> >>>> > >> > >>> > >> >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > >>> >>>> > >> > >> >>>> > >> > >> >>>> > >> > > >>>> > >> > >>>> > >> >>>> > > >>>> > > >>>> > >>>> >>> >>> >> >
