Hi Wang, would it be possible to open a JIRA issue so we can track this? In any case, I would recommend to disable compound files if you use NRTCachingDirectory (as a workaround).
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: wangzhijiang999 [mailto:wangzhijiang...@aliyun.com] > Sent: Tuesday, July 01, 2014 9:17 AM > To: java-user > Subject: 答复:RE: RE: About lucene memory consumption > > My application also meet this problem last year and I researched on the code > and found the reason. > The whole process is as follow: > 1. When using NRTCachingDirectory, it will use RAMDirectory as cache and > MMapDirectory as delegate. The new segment will be created in the process > of flush or merge. And the NRTCachingDirectory use the parameters of > maxMergeSizeBytes and maxCachedBytes to decide to create the new > segment in cache(in memory) or in delegate(in disk). > 2. When flush to create new segment, it will compare the > context.fllushinfo.estimatedSegmentSize of new segment with the above > parameter. If the size of new segment is small, then it will be created > in RAMDirectory, otherwise in MMapDirectory. > 3. When merge to create new segment, it will compare the > context.mergeInfo.estimatedMergeBytes of new segment with the above > parameter. And if the size of new segment is small, it will be created in > cache, > otherwise in delegate. > 4. But when the new segment is compound index file(cfs) no matter during > flush or merge, it will use IOContext.DEFAULT for that segment, and the > estimatedMergeBytes ,estimatedSegmentSize are both null for > IOContext.DEFAULT, resulting in creating the new compund segment file > always in cache no matter how big it really is. This is the core issue. > > Then I will explain the mechanism of releasing the segment in cache. > 1. Normally, in the process of commit, the sync operation will flush the new > created segment files to the disk, and delete them from the cache. But if the > merging process is running during the sync, so the new created segment by > merge will not be sync to disk in this commit, and the new merged > compound segment file will be created in cache as described above. > 2. If using NRT feature, the IndexSearcher will get segmentReader from the > IndexWriter by getReader method. And theire is a ReaderPool > inside the IndexWriter. For the new segment, it will first fetch from the > cache > of NRTCachingDirectory, if the new segment is not in the cache(created > directly in the disk or commit to disk releasing from the cache), then fetch > it > from the delegate. The new fetched segment will be put in the ReaderPool > in the IndexWriter. As described above, the new segment created by merge > is in the cache now, and when it is fetched by IndexWriter, it will be > referenced by the ReaderPool of IndexWriter. In the process of next > commit, this new segment will be sync to disk and released from the cache, > but it is still referenced by the ReaderPool. And you will see the > IndexSearcher reference a lot of RAMFile which are already in the disk. > When these RAMFil can be dropped? When these segments join the new > merging process to create new segment, then these old segments will be > released from the ReaderPool of the IndexWriter completely. > > I modified the lucene souce code to solve this problem in the > CompoundFileWriter class. > out = new DirectCFSIndexOutput(getOutput(), entry, false); //original out = > new DirectCFSIndexOutput(getOutput(context), entry, false); //modified > > IndexOutput createOutput(String name, IOContext context) throws > IOException { ensureOpen(); boolean success = false; boolean > outputLocked = false; try { assert name != null : "name must not be null"; > if > (entries.containsKey(name)) { throw new IllegalArgumentException("File " > + name + " already exists"); } final FileEntry entry = new > FileEntry(); entry.file = name; entries.put(name, entry); final String > id = > IndexFileNames.stripSegmentName(name); assert !seenIDs.contains(id) : > "file=\"" + name + "\" maps to id=\"" + id + "\", which was already > written"; seenIDs.add(id); final DirectCFSIndexOutput out; > if ((outputLocked = outputTaken.compareAndSet(false, true))) { //out = > new DirectCFSIndexOutput(getOutput(), entry, false); out = new > DirectCFSIndexOutput(getOutput(context), entry, false); } else { > entry.dir > = this.directory; if (directory.fileExists(name)) { throw new > IllegalArgumentException("File " + name + " already exists"); } out = > new > DirectCFSIndexOutput(directory.createOutput(name, context), entry, > true); } success = true; return out; } finally { if (!success) > { entries.remove(name); if (outputLocked) { // release the output lock > if > not successful assert outputTaken.get(); releaseOutputLock(); } > } } } > private synchronized IndexOutput getOutput(IOContext context) throws > IOException { if (dataOut == null) { boolean success = false; try { > dataOut > = directory.createOutput(dataFileName, > context); CodecUtil.writeHeader(dataOut, DATA_CODEC, > VERSION_CURRENT); success = true; } finally { if (!success) > { IOUtils.closeWhileHandlingException(dataOut); } } } return > dataOut; } > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org