[GitHub] [lucene] uschindler opened a new pull request #173: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)

GitBox Mon, 07 Jun 2021 05:06:46 -0700


uschindler opened a new pull request #173:
URL: https://github.com/apache/lucene/pull/173



   **INFO: This is a clone/update of 
https://github.com/apache/lucene-solr/pull/2176** _(for more detailed 
discussion see this old PR from the Lucene/Solr combined repository)_
   
   This is just a draft PR for a first insight on memory mapping improvements 
in JDK 16+.
   
   Some background information: Starting with JDK-14, there is a new incubating 
module "jdk.incubator.foreign" that has a new, not yet stable API for accessing 
off-heap memory (and later it will also support calling functions using 
classical MethodHandles that are located in libraries like .so or .dll files). 
This incubator module has several versions:
   - first version: https://openjdk.java.net/jeps/370 (slow, very buggy and 
thread confinement, so making it unuseable with Lucene)
   - second version: https://openjdk.java.net/jeps/383 (still thread 
confinement, but now allows transfer of "ownership" to other threads; this is 
still impossible to use with Lucene.
   - third version in JDK 16: https://openjdk.java.net/jeps/393 (this version 
has included "Support for shared segments"). This now allows us to safely use 
the same external mmaped memory from different threads and also unmap it!
   
   This module more or less overcomes several problems:
   - ByteBuffer API is limited to 32bit (in fact MMapDirectory has to chunk in 
1 GiB portions)
   - There is no official way to unmap ByteBuffers when the file is no longer 
used. There is a way to use `sun.misc.Unsafe` and forcefully unmap segments, 
but any IndexInput accessing the file from another thread will crush the JVM 
with SIGSEGV or SIGBUS. We learned to live with that and we happily apply the 
unsafe unmapping, but that's the main issue.
   
   @uschindler had many discussions with the team at OpenJDK and finally with 
the third incubator, we have an API that works with Lucene. It was very 
fruitful discussions (thanks to @mcimadamore !)
   
   With the third incubator we are now finally able to do some tests 
(especially performance). As this is an incubating module, this PR first 
changes a bit the build system:
   - disable `-Werror` for `:lucene:core`
   - add the incubating module to compiler of `:lucene:core` and enable it for 
all test builds. This is important, as you have to pass `--add-modules 
jdk.incubator.foreign` also at runtime!
   
   The code basically just modifies `MMapDirectory` to use LONG instead of INT 
for the chunk size parameter. In addition it adds `MemorySegmentIndexInput` 
that is a copy of our `ByteBufferIndexInput` (still there, but unused), but 
using MemorySegment instead of ByteBuffer behind the scenes. It works in 
exactly the same way, just the try/catch blocks for supporting EOFException or 
moving to another segment were rewritten.
   
   The openInput code uses `MemorySegment.mapFile()` to get a memory mapping. 
This method is unfortunately a bit buggy in JDK-16-ea-b30, so I added some 
workarounds. See JDK issues: https://bugs.openjdk.java.net/browse/JDK-8259027, 
https://bugs.openjdk.java.net/browse/JDK-8259028, 
https://bugs.openjdk.java.net/browse/JDK-8259032, 
https://bugs.openjdk.java.net/browse/JDK-8259034. The bugs with alignment and 
zero byte mmaps are fixed in b32, this PR was adapted (hacks removed).
   
   It passes all tests and it looks like you can use it to read indexes. The 
default chunk size is now 16 GiB (but you can raise or lower it as you like; 
tests are doing this). Of course you can set it to Long.MAX_VALUE, in that case 
every index file is always mapped to one big memory mapping. My testing with 
Windows 10 have shown, that this is *not a good idea!!!*. Huge mappings 
fragment address space over time and as we can only use like 43 or 46 bits 
(depending on OS), the fragmentation will at some point kill you. So 16 GiB 
looks like a good compromise: Most files will be smaller than 6 GiB anyways 
(unless you optimize your index to one huge segment). So for most Lucene 
installations, the number of segments will equal the number of open files, so 
Elasticsearch huge user consumers will be very happy. The sysctl max_map_count 
may not need to be touched anymore.
   
   In addition, this implements `readLELongs` in a better way than @jpountz did 
(no caching or arbitrary objects). Nevertheless, as the new MemorySegment API 
relies on final, unmodifiable classes and coping memory from a MemorySegment to 
a on-heap Java array, it requires us to wrap all those arrays using a 
MemorySegment each time (e.g. in `readBytes()` or `readLELongs`), there may be 
some overhead du to short living object allocations (those are NOT 
reuseable!!!). _In short: In future we should throw away on coping/loading our 
stuff to heap and maybe throw away IndexInput completely and base our code 
fully on random access. The new foreign-vector APIs will in future also be 
written with MemorySegment in its focus. So you can allocate a vector view on a 
MemorySegment and let the vectorizer fully work outside java heap inside our 
mmapped files! :-)_
   
   It would be good if you could checkout this branch and try it in production.
   
   But be aware:
   - You need JDK 11 to run Gradle (set `JAVA_HOME` to it)
   - You need JDK 16-ea-b32 (set `RUNTIME_JAVA_HOME` to it)
   - The lucene-core.jar will be JDK16 class files and requires JDK-16 to 
execute.
   - Also you need to add `--add-modules jdk.incubator.foreign` to the command 
line of your Java program/Solr server/Elasticsearch server
   
   It would be good to get some benchmarks, especially by @rmuir or 
@mikemccand. _Take your time and enjoy the complexity of setting this up!_ ;-)
   
   My plan is the following:
   - report any bugs or slowness, especially with Hotspot optimizations. The 
last time I talked to Maurizio, he taked about Hotspot not being able to fully 
optimize for-loops with long instead of int, so it may take some time until the 
full performance is there.
   - wait until the final version of project PANAMA-foreign goes into Java's 
Core Library (no module needed anymore)
   - add a MR-JAR for lucene-core.jar and compile the MemorySegmentIndexInput 
and maybe some helper classes with JDK 17/18/19 (hopefully?).
   
   ~~In addition there are some comments in the code talking about safety 
(e.g., we need `IOUtils.close()` taking `AutoCloseable` instead of just 
`Closeable`, so we can also enfoce that all memory segments are closed after 
usage.~~ In addition, by default all VarHandles are aligned. By default it 
refuses to read a LONG from an address which is not a multiple of 8. I had to 
disable this feature, as all our index files are heavily unaliged. We should in 
meantime not only convert our files to little endian, but also make all 
non-compressed types (like `long[]` arrays or non-encoded integers be aligned 
to the correct boundaries in files). The most horrible thing I have seen is 
that our CFS file format starts the "inner" files totally unaligned. We should 
fix the CFSWriter to start new files always at multiples of 8 bytes. I will 
open an issue about this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler opened a new pull request #173: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) Panama APIs (>= JDK-16-ea-b32)

Reply via email to