RE: MappedByteBuffer duplicates

Kameron Cole Mon, 27 Feb 2017 08:39:43 -0800

Uwe,

It's clear to me now - I guess that puts garbage collection out of the 
picture.


But what is then more confusing - especially if, as you say, Apache Lucene 
forcefully unmaps all mapped byte buffers when it closes the IndexInputs. 
So, it must mean that for some reason the IndexInputs are not getting 
closed.  Is there a way to see that?  I guess you very clearly outlined 
these possible causes, which will require code checking:

1) If you do not close IndexWriter and DirectoryReaders when required, the 
index files stay open. 
2) If indexing goes on and you reopen the DirectoryReader (e.g. with the 
near realtime functions of IndexWriter to see the actual state), be sure 
to close the "old" reader. Otherwise it will open more and more files. 

In our case, indexing (actually, re-indexing) happens a lot!  The people 
managing this installation have a need to keep the large index updated. Is 
there just a kind of fundamental "race condition" that comes from indexing 
back-to-back?  Clearly, fewer rebuilds in a day lessens the danger of 
machine crash.  We can be fairly certain of that.  Still, I don't see why 
it should be necessary to worry about too many index builds.  The OS 
should be able to handle this.
 
I keep coming back to this, though - can this have anything to do with 
Windows virtual memory management?  I kind of specialized way back in 
college in OS level functions, and Windows has a completely different 
paradigm that Unix for memory management.  Throughout a pretty long career 
in IT development, I have seen time and time again - including in this 
case - that when you reboot Windows, the memory problems are gone.  I have 
almost never seen or heard of rebooting Linux or AIX in this regard. That 
said, I guess that any discussion of ulimits is moot, right? 





From:   "Uwe Schindler" <u...@thetaphi.de>
To:     <java-user@lucene.apache.org>
Date:   02/24/2017 06:22 PM
Subject:        RE: MappedByteBuffer duplicates



Hi,

You did not give us all information. So I can only give some hints, 
because there could be multiple causes for your problems. There is for 
sure no bug in Apache Lucene as there are thousands of Solr and 
Elasticsearch instances running without such problems.

> Actually, at a certain point, they have crashed the machine. The native
> file mappings are deallocated (unmapped) by the JVM when the
> MappedByteBuffers are eligible for garbage collection. The problem we're
> seeing  is that there are thousands of MappedByteBuffers which are not
> eligible for garbage collection. The native memory is retained because 
the
> Lucene code is still referencing the MappedByteBuffer objects on the 
Java
> heap. This isn't the fault of Windows or the JVM. It appears to be a 
fault
> in Lucen, but we can't diagnose it - we can't see why the 
MappedByteBuffer
> objects are being retained.

For Apache Lucene this is not true:

Apache Lucene forcefully unmaps all mapped byte buffers when it closes the 
IndexInputs. Without that, we would need to wait for Garbage Collection 
for this to happen, which not only brings problems for virtual address 
space (your problem), but also disk usage (files that have mapped contents 
cannot be deleted). So your statement is not true. Lucene does not need to 
wait for Garbage Collector, it forces unmapping!

If forceful unmapping does not work (requires Oracle JDK, OpenJDK or IBM 
J9 - version [7 for Lucene 5], Java 8, Java 9 b150+), MMapDirectory is not 
used by default. This happens on JVMs which do not expose the internal 
APIs that are needed to do that. To check this, print the contents of:

http://lucene.apache.org/core/6_4_1/core/org/apache/lucene/store/MMapDirectory.html#UNMAP_SUPPORTED

http://lucene.apache.org/core/6_4_1/core/org/apache/lucene/store/MMapDirectory.html#UNMAP_NOT_SUPPORTED_REASON


If you use FSDirectory.open() to get a directory instance (factory 
method), it will not choose MMapDir if unmapping is not supported. So It 
may happen that you forcefully use MMapDirectory, although unmapping does 
not work for your JVM?

Nevertheless, you say that you see many MappedByteBuffers that are not 
eligible for garbage collection. Of course Lucene will not unmap those 
because they are still in use. The reason for this could be incorrect code 
on your side. If you do not close IndexWriter and DirectoryReaders when 
required, the index files stay open. If indexing goes on and you reopen 
the DirectoryReader (e.g. with the near realtime functions of IndexWriter 
to see the actual state), be sure to close the "old" reader. Otherwise it 
will open more and more files. Depending on maximum open files limit, you 
can run out of file handles or (if you have many file handles) or it may 
crush the machine, because you use all virtual address space.

To fully analyze your problem, we need more information. Please also 
provide:
- Lucene version
- Operating System version
- "ulimit -a" output (POSIX operating systems)
- Java version and vendor
- Crash report
- Source code to show what you are doing: Just indexing (your problem is 
impossible), indexing and searching in parallel, do your use NRT readers 
for realtime visibility of indexed content 

Uwe

> From:   "Uwe Schindler" <u...@thetaphi.de>
> To:     <java-user@lucene.apache.org>
> Date:   02/24/2017 01:39 PM
> Subject:        RE: MappedByteBuffer duplicates
> 
> 
> 
> Hi,
> 
> that is not an issue, the duplicates are required for so called 
IndexInput
> clones and splices. Every search request will create many of them. But
> there is no need to worry, they are just thin wrappers - they don't
> allocate any extra off-heap memory. They are just there to have a 
separate
> position(), limit() and other settings for each searcher thread.
> 
> Why do you worry?
> Uwe
> 
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> > -----Original Message-----
> > From: Kameron Cole [mailto:kameronc...@us.ibm.com]
> > Sent: Friday, February 24, 2017 7:19 PM
> > To: java-user@lucene.apache.org
> > Subject: MappedByteBuffer duplicates
> >
> > We have a Lucene engine that creates MappedByteBuffer objects when
> > creating the Lucene index.  I don't know Lucene well enough to know if
> > this standard behavior.
> >
> > The mapped files are being created by Lucene, via the JRE's NIO APIs
> > native file mapping underneath each MappedByteBuffer object. We see an
> > issue where duplicate MappedByteBuffer objects are being created.  Has
> > anyone seen this?
> >
> > Thank you!
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: MappedByteBuffer duplicates

Reply via email to