Yeah we were experiencing the fragmentation problem as we would add information to the system. Before this patch we had to leave 30%-40% of the memory on the server just in case the fragmentation problem occurred. It still didn't totally fix the problem. However after moving to the code that's in the patch, the problem we had with adding data went away completely.
Aaron On Mon, Jul 18, 2016 at 5:58 AM, Ravikumar Govindarajan < [email protected]> wrote: > Sorry for bombarding the mailing-list... > > I've also just now found out about this JIRA > https://issues.apache.org/jira/browse/BLUR-433 > > Think it's really useful for our case too. Many thanks for this neat patch > > On Mon, Jul 18, 2016 at 2:39 PM, Ravikumar Govindarajan < > [email protected]> wrote: > > > We have also made a patch for having a high-water-mark level (15% of > > excess block-cache capacity) after which cache-writes are stopped. > > > > Once capacity is reclaimed via clean-up thread, we resume adding to cache > > > > On Mon, Jul 18, 2016 at 1:58 PM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > >> We had an issue with block-cache growing beyond configured size & > >> reducing very rarely. Describing the sequence of events > >> > >> 1. Shard receives incoming mutations, adds it to Index & triggers > >> background merge. > >> 2. Merge produces new-set of files. We have write-thru cache enabled > >> & adds new files to block-cache.. > >> 3. Shard goes silent & doesn't receive any mutation for many minutes > >> all together > >> 4. Since we perform commit only upon receiving mutations, the > >> older-files are not evicted from block-cache.. > >> 5. Problem is exacerbated with KeepNLastCommit policy, where even > >> after commit, unused files are not evicted from block-cache.. > >> > >> > >> We are planning to patch up SharedMergeScheduler by refreshing > >> IndexReader when a merge completes & then delete merged files from > >> block-cache. This way, I believe block-cache can be reigned in whenever > it > >> exceeds capacity, irrespective of Commit-Policy used > >> > >> Do let know if this is fine... > >> > >> On Thu, Jun 16, 2016 at 4:33 PM, Ravikumar Govindarajan < > >> [email protected]> wrote: > >> > >>> I didn't fully understand the underlying Lucene reader, writer, > >>>> open, close semantics > >>> > >>> > >>> I too don't know the correct behavior. Lucene code is incredibly hairy > >>> to follow... :) > >>> > >>> Have pinged lucene mailing list. Hope someone replies... > >>> > >>> On Tue, Jun 7, 2016 at 4:46 PM, Aaron McCurry <[email protected]> > >>> wrote: > >>> > >>>> On Wed, Jun 1, 2016 at 7:34 AM, Ravikumar Govindarajan < > >>>> [email protected]> wrote: > >>>> > >>>> > Just one more observation here... > >>>> > > >>>> > Even if readerPooling is set to true, lucene has 2 readers (One for > >>>> search > >>>> > & one updates/deletes) > >>>> > > >>>> > But the reader for updates/deletes is not opened/closed for every > >>>> commit > >>>> > call which is the default behavior as of today. It is opened only > once > >>>> > (During first update/delete call) > >>>> > > >>>> > >>>> I will take a closer look at the code for this one. Likely when I > wrote > >>>> this code I didn't fully understand the underlying Lucene reader, > >>>> writer, > >>>> open, close semantics. Thank you for pointing this out! > >>>> > >>>> Aaron > >>>> > >>>> > >>>> > > >>>> > On Wed, Jun 1, 2016 at 3:10 PM, Ravikumar Govindarajan < > >>>> > [email protected]> wrote: > >>>> > > >>>> > > In newer versions of the code there are multiple streams involved. > >>>> One > >>>> > for > >>>> > >> each open file handle plus if a sequential read is detected a new > >>>> stream > >>>> > >> is > >>>> > >> created for the instance for better performance > >>>> > > > >>>> > > > >>>> > > Great. We just patched up our Blur version with this code. > >>>> > > > >>>> > > While I was digging at the reader-closed issue, was quite > surprised > >>>> to > >>>> > > observe the following behavior > >>>> > > > >>>> > > - Issue a commit > >>>> > > - Lucene opens a new reader via IndexWriter. (Doesn't re-use > our > >>>> > > already opened DirectoryReader) > >>>> > > - Processes all updates/deletes/merges > >>>> > > - Closes the new reader > >>>> > > - Complete commit > >>>> > > > >>>> > > For a big index & lots of commits, opening a new-reader for every > >>>> commit > >>>> > > is prohibitively expensive. > >>>> > > > >>>> > > > >>>> > > Here is the JIRA for it... > >>>> > > https://issues.apache.org/jira/browse/LUCENE-2297 > >>>> > > > >>>> > > All we need to do is just set "readerPooling=true" in > >>>> IndexWriterConfig > >>>> > > class > >>>> > > > >>>> > > Please do explore this option when you find time. > >>>> > > > >>>> > > -- > >>>> > > Ravi > >>>> > > > >>>> > > > >>>> > > > >>>> > > On Tue, May 24, 2016 at 7:48 PM, Aaron McCurry < > [email protected]> > >>>> > wrote: > >>>> > > > >>>> > >> On Tue, May 24, 2016 at 6:06 AM, Ravikumar Govindarajan < > >>>> > >> [email protected]> wrote: > >>>> > >> > >>>> > >> > We have solved it temporarily by using a KeepLastTwoCommits del > >>>> > policy. > >>>> > >> We > >>>> > >> > don't get these exceptions now!!! > >>>> > >> > > >>>> > >> > >>>> > >> Great! > >>>> > >> > >>>> > >> > >>>> > >> > > >>>> > >> > Btw, I see that pread calls in FSDataInputStream.java are > >>>> > synchronized. > >>>> > >> Is > >>>> > >> > it possible that merge DFS read calls could potentially block > >>>> search > >>>> > DFS > >>>> > >> > read calls? > >>>> > >> > > >>>> > >> > >>>> > >> Yes. > >>>> > >> > >>>> > >> > >>>> > >> > > >>>> > >> > Would it be a good idea to have 2 DFSInputStreams for every > >>>> file, one > >>>> > >> for > >>>> > >> > merge & another for search? > >>>> > >> > > >>>> > >> > >>>> > >> In newer versions of the code there are multiple streams > >>>> involved. One > >>>> > >> for > >>>> > >> each open file handle plus if a sequential read is detected a new > >>>> stream > >>>> > >> is > >>>> > >> created for the instance for better performance. Checkout the > >>>> > >> HdfsDirectory class. > >>>> > >> > >>>> > >> Aaron > >>>> > >> > >>>> > >> > >>>> > >> > > >>>> > >> > On Tue, May 10, 2016 at 7:43 PM, Ravikumar Govindarajan < > >>>> > >> > [email protected]> wrote: > >>>> > >> > > >>>> > >> > > Sorry, I mis-understood the code. > >>>> > >> > > I see that it has 2 locks IndexRefreshWriteLock & > >>>> > >> IndexRefreshReadLock. > >>>> > >> > > They look to be separate > >>>> > >> > > > >>>> > >> > > On Tue, May 10, 2016 at 7:16 PM, Ravikumar Govindarajan < > >>>> > >> > > [email protected]> wrote: > >>>> > >> > > > >>>> > >> > >> Thanks a lot Aaron. > >>>> > >> > >> > >>>> > >> > >> I guess we took a commit of 0.2.2 that doesn't have the > >>>> > >> > >> IndexRefreshWriteLock (IRWL). It looks like it co-ordinates > >>>> between > >>>> > >> > >> searches & incoming mutation commits. If so, then it will > >>>> likely > >>>> > >> solve > >>>> > >> > the > >>>> > >> > >> first issue for us (AlreadyClosedException) > >>>> > >> > >> > >>>> > >> > >> > >>>> > >> > >> Can you recollect if that was the reason IRWL was > introduced? > >>>> > >> > >> > >>>> > >> > >> On Tue, May 10, 2016 at 6:40 PM, Aaron McCurry < > >>>> [email protected] > >>>> > > > >>>> > >> > >> wrote: > >>>> > >> > >> > >>>> > >> > >>> On Tue, May 10, 2016 at 2:30 AM, Ravikumar Govindarajan < > >>>> > >> > >>> [email protected]> wrote: > >>>> > >> > >>> > >>>> > >> > >>> > Actually there are 2 issues... > >>>> > >> > >>> > > >>>> > >> > >>> > 1. IndexReaderClosedException > >>>> > >> > >>> > 2. HDFS Stream Closed > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > >>> Likely when the index is closed it closes the underlying > >>>> > >> indexinputs as > >>>> > >> > >>> well causing the HDFS Stream closed exception. > >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > Merge completion results in File Deletion & ultimately > HDFS > >>>> > Stream > >>>> > >> > >>> Closed > >>>> > >> > >>> > during Search.... > >>>> > >> > >>> > > >>>> > >> > >>> > I use IndexFileDeleter with > >>>> KeepOnlyLastCommitDeletionPolicy. > >>>> > This > >>>> > >> > >>> blindly > >>>> > >> > >>> > deletes the file, without bothering to cross-check > >>>> > >> > >>> IndexReader.RefCount > > >>>> > >> > >>> > 0. > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > >>> Hmm. You can see here: > >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > https://github.com/apache/incubator-blur/blob/release-0.2.2-incubating/blur-core/src/main/java/org/apache/blur/manager/writer/BlurIndexSimpleWriter.java#L303 > >>>> > >> > >>> > >>>> > >> > >>> That once the new index is available it is swapped into the > >>>> index > >>>> > >> ref > >>>> > >> > >>> object and the old one is sent to the index closer. Once > >>>> the ref > >>>> > to > >>>> > >> > the > >>>> > >> > >>> index are low enough it closes the index. Or at least it > >>>> should. > >>>> > >> > >>> > >>>> > >> > >>> I will continue looking into the problem but I don't have a > >>>> > solution > >>>> > >> > for > >>>> > >> > >>> you yet. > >>>> > >> > >>> > >>>> > >> > >>> Aaron > >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > *Exception(message:Unknown error during rewrite, > >>>> > >> > >>> > stackTraceStr:java.io.IOException: Stream closed* > >>>> > >> > >>> > at > >>>> > >> > >>> > >>>> > >> > >>>> org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1385) > >>>> > >> > >>> > at > >>>> > >> > > >>>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1374) > >>>> > >> > >>> > at > >>>> > >> > >>> > >>>> > >> > >>>> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.hdfs.HdfsIndexInput.readInternal(HdfsIndexInput.java:62) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.buffer.ReusedBufferedIndexInput.readBytes(ReusedBufferedIndexInput.java:167) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.buffer.ReusedBufferedIndexInput.readBytes(ReusedBufferedIndexInput.java:122) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.hdfs.MmapCacheIndexInput.readAndcache(MmapCacheIndexInput.java:24) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.blockcache_v2.CacheIndexInput.fillNormally(CacheIndexInput.java:354) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.blockcache_v2.CacheIndexInput.fill(CacheIndexInput.java:379) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.blockcache_v2.CacheIndexInput.tryToFill(CacheIndexInput.java:297) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.store.blockcache_v2.CacheIndexInput.readByte(CacheIndexInput.java:151) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.lucene.warmup.TraceableIndexInput.readByte(TraceableIndexInput.java:62) > >>>> > >> > >>> > at > >>>> > org.apache.lucene.store.DataInput.readVInt(DataInput.java:108) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock(BlockTreeTermsReader.java:2366) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekCeil(BlockTreeTermsReader.java:1949) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.index.ExitableReader$ExitableTermsEnum.seekCeil(ExitableReader.java:250) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:225) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:78) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.search.ConstantScoreAutoRewrite.rewrite(ConstantScoreAutoRewrite.java:95) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.search.MultiTermQuery$ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:220) > >>>> > >> > >>> > at > >>>> > >> > >>> > >>>> > >> > > >>>> > > >>>> > org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) > >>>> > >> > >>> > at > >>>> > >> > > >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) > >>>> > >> > >>> > at > >>>> > >> > > >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) > >>>> > >> > >>> > at > >>>> > >> > > >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) > >>>> > >> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>> > On Mon, May 9, 2016 at 4:42 PM, Ravikumar Govindarajan < > >>>> > >> > >>> > [email protected]> wrote: > >>>> > >> > >>> > > >>>> > >> > >>> > > One extra info we gleaned from the logs... > >>>> > >> > >>> > > > >>>> > >> > >>> > > 1. Merge Starts & is about to complete > >>>> > >> > >>> > > 2. Searcher is opened > >>>> > >> > >>> > > 3. Merge Completes > >>>> > >> > >>> > > 4. Ref-count drops to 0 in IndexReader > >>>> > >> > >>> > > 5. IndexReader closed while Searcher is still open > >>>> > >> > >>> > > > >>>> > >> > >>> > > This seems to be the main pattern for causing the > >>>> Exception > >>>> > >> > >>> > > > >>>> > >> > >>> > > -- > >>>> > >> > >>> > > Ravi > >>>> > >> > >>> > > > >>>> > >> > >>> > > On Mon, May 9, 2016 at 3:08 PM, Ravikumar Govindarajan > < > >>>> > >> > >>> > > [email protected]> wrote: > >>>> > >> > >>> > > > >>>> > >> > >>> > >> Thanks Aaron... > >>>> > >> > >>> > >> > >>>> > >> > >>> > >> Just a quick question. Lucene itself has ref-counting > to > >>>> > close > >>>> > >> > it's > >>>> > >> > >>> > >> readers no? Or Blur has it's own logic to handle it? > >>>> > >> > >>> > >> > >>>> > >> > >>> > >> -- > >>>> > >> > >>> > >> Ravi > >>>> > >> > >>> > >> > >>>> > >> > >>> > >> On Fri, May 6, 2016 at 7:56 PM, Aaron McCurry < > >>>> > >> [email protected] > >>>> > >> > > > >>>> > >> > >>> > wrote: > >>>> > >> > >>> > >> > >>>> > >> > >>> > >>> Likely yes. If have a few minutes this weekend I can > >>>> look > >>>> > >> > through > >>>> > >> > >>> that > >>>> > >> > >>> > >>> version and see if I can point you in the right > >>>> direction. > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> On Fri, May 6, 2016 at 8:46 AM, Ravikumar > Govindarajan > >>>> < > >>>> > >> > >>> > >>> [email protected]> wrote: > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >>> > Sometimes during an ongoing search we receive an > >>>> > >> > >>> > >>> > IndexReaderClosedException... > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > We are on an older version of Blur (0.2.2). Has > this > >>>> been > >>>> > >> fixed > >>>> > >> > >>> in > >>>> > >> > >>> > >>> newer > >>>> > >> > >>> > >>> > versions or we have been using it wrongly? > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > > >>>> > >> *stackTraceStr:org.apache.lucene.store.AlreadyClosedException: > >>>> > >> > >>> this > >>>> > >> > >>> > >>> > IndexReader cannot be used anymore as one of its > >>>> child > >>>> > >> readers > >>>> > >> > >>> was > >>>> > >> > >>> > >>> closed* > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>>> org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:257) > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.index.FilterAtomicReader.fields(FilterAtomicReader.java:380) > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.blur.index.ExitableReader$ExitableFilterAtomicReader.fields(ExitableReader.java:81) > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:52) > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.search.ConstantScoreAutoRewrite.rewrite(ConstantScoreAutoRewrite.java:95) > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > >> > >>>> > > >>>> > org.apache.lucene.search.MultiTermQuery$ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:220) > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > >>> > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > > >>>> > > >>>> > org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) > >>>> > >> > >>> > >>> > at > >>>> > >> > >>> > > >>>> > >> > >>>> org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:412) > >>>> > >> > >>> > >>> > > >>>> > >> > >>> > >>> > >>>> > >> > >>> > >> > >>>> > >> > >>> > >> > >>>> > >> > >>> > > > >>>> > >> > >>> > > >>>> > >> > >>> > >>>> > >> > >> > >>>> > >> > >> > >>>> > >> > > > >>>> > >> > > >>>> > >> > >>>> > > > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > > >
