Re: RFR 8243491: Implementation of Foreign-Memory Access API (Second Incubator)

Paul Sandoz Fri, 24 Apr 2020 09:41:41 -0700

> On Apr 23, 2020, at 5:45 PM, Maurizio Cimadamore 
> <[email protected]> wrote:
> 
> 
> On 24/04/2020 01:35, Paul Sandoz wrote:
>> Hi,
>> 
>> Looks good. I have seen almost all of this in reviews on panama-dev hence 
>> the lack of substantial comments here.
>> 
>> I suspect we are not gonna need the drop argument VH combinator, dropping 
>> coordinates feels a little suspicious to me, but I can see why it's there 
>> for completeness.
> 
> Thanks Paul.
> 
> Re. drop coordinates, note that we're actually using it in 
> MemoryHandles.withStride, so that we can insert a dummy coordinate that is 
> always discarded in case the stride is zero. We could do without it as well, 
> of course, but sometimes it's hard to have a sense of how these pieces might 
> be joined in practice.
> 

Ah, ok I can see its value now when dropping a coordinate that would otherwise 
result in some redundant calculation.  However, looking at this more closely:

* @param bytesStride the stride, in bytes, by which to multiply the coordinate 
value. Must be greater than zero.

It implies that a zero value should be disallowed contrary to the 
implementation.  I am wondering if your intent was support a signed stride 
value?

In this case it could be argued that a zero stride is misleading, if supported, 
since any value passed for the coordinate X has no effect.  But I can also see 
the other side from a position uniformity, and then why not support negative 
strides, which I think the implementation does support.

Paul.

> Maurizio
> 
>> Paul.
>> 
>>> On Apr 23, 2020, at 1:33 PM, Maurizio Cimadamore 
>>> <[email protected]> wrote:
>>> 
>>> Hi,
>>> time has come for another round of foreign memory access API incubation 
>>> (see JEP 383 [3]). This iteration aims at polishing some of the rough edges 
>>> of the API, and adds some of the functionalities that developers have been 
>>> asking for during this first round of incubation. The revised API tightens 
>>> the thread-confinement constraints (by removing the MemorySegment::acquire 
>>> method) and instead provides more targeted support for parallel computation 
>>> via a segment spliterator. The API also adds a way to create a custom 
>>> native segment; this is, essentially, an unsafe API point, very similar in 
>>> spirit to the JNI NewDirectByteBuffer functionality [1]. By using this bit 
>>> of API,  power-users will be able to add support, via MemorySegment, to 
>>> *their own memory sources* (e.g. think of a custom allocator written in 
>>> C/C++). For now, this API point is called off as "restricted" and a special 
>>> read-only JDK property will have to be set on the command line for calls to 
>>> this method to succeed. We are aware there's no precedent for something 
>>> like this in the Java SE API - but if Project Panama is to remain true 
>>> about its ultimate goal of replacing bits of JNI code with (low level) Java 
>>> code, stuff like this has to be *possible*. We anticipate that, at some 
>>> point, this property will become a true launcher flag, and that the foreign 
>>> restricted machinery will be integrated more neatly into the module system.
>>> 
>>> A list of the API, implementation and test changes is provided below. If 
>>> you have any questions, or need more detailed explanations, I (and the rest 
>>> of the Panama team) will be happy to point at existing discussions, and/or 
>>> to provide the feedback required.
>>> 
>>> Thanks
>>> Maurizio
>>> 
>>> Webrev:
>>> 
>>> http://cr.openjdk.java.net/~mcimadamore/8243491_v1/webrev
>>> 
>>> Javadoc:
>>> 
>>> http://cr.openjdk.java.net/~mcimadamore/8243491_v1/javadoc
>>> 
>>> Specdiff:
>>> 
>>> http://cr.openjdk.java.net/~mcimadamore/8243491_v1/specdiff/overview-summary.html
>>> 
>>> CSR:
>>> 
>>> https://bugs.openjdk.java.net/browse/JDK-8243496
>>> 
>>> 
>>> 
>>> API changes
>>> ===========
>>> 
>>> * MemorySegment
>>>   - drop support for acquire() method - in its place now you can obtain a 
>>> spliterator from a segment, which supports divide-and-conquer
>>>   - revamped support for views - e.g. isReadOnly - now segments have access 
>>> modes
>>>   - added API to do serial confinement hand-off 
>>> (MemorySegment::withOwnerThread)
>>>   - added unsafe factory to construct a native segment out of an existing 
>>> address; this API is "restricted" and only available if the program is 
>>> executed using the -Dforeign.unsafe=permit flag.
>>>   - the MemorySegment::mapFromPath now returns a MappedMemorySegment
>>> * MappedMemorySegment
>>>   - small sub-interface which provides extra capabilities for mapped 
>>> segments (load(), unload() and force())
>>> * MemoryAddress
>>>   - added distinction between *checked* and *unchecked* addresses; 
>>> *unchecked* addresses do not have a segment, so they cannot be dereferenced
>>>   - added NULL memory address (it's an unchecked address)
>>>   - added factory to construct MemoryAddress from long value (result is 
>>> also an unchecked address)
>>>   - added API point to get raw address value (where possible - e.g. if this 
>>> is not an address pointing to a heap segment)
>>> * MemoryLayout
>>>   - Added support for layout "attributes" - e.g. store metadata inside 
>>> MemoryLayouts
>>>   - Added MemoryLayout::isPadding predicate
>>>   - Added helper function to SequenceLayout to rehape/flatten sequence 
>>> layouts (a la NDArray [4])
>>> * MemoryHandles
>>>   - add support for general VarHandle combinators (similar to MH 
>>> combinators)
>>>   - add a combinator to turn a long-VH into a MemoryAddress VH (the 
>>> resulting MemoryAddress is also *unchecked* and cannot be dereferenced)
>>> 
>>> Implementation changes
>>> ======================
>>> 
>>> * add support for VarHandle combinators (e.g. IndirectVH)
>>> 
>>> The idea here is simple: a VarHandle can almost be thought of as a set of 
>>> method handles (one for each access mode supported by the var handle) that 
>>> are lazily linked. This gives us a relatively simple idea upon which to 
>>> build support for custom var handle adapters: we could create a VarHandle 
>>> by passing an existing var handle and also specify the set of adaptations 
>>> that should be applied to the method handle for a given access mode in the 
>>> original var handle. The result is a new VarHandle which might support a 
>>> different carrier type and more, or less coordinate types. Adding this 
>>> support was relatively easy - and it only required one low-level surgery of 
>>> the lambda forms generated for adapted var handle (this is required so that 
>>> the "right" var handle receiver can be used for dispatching the access mode 
>>> call).
>>> 
>>> All the new adapters in the MemoryHandles API (which are really defined 
>>> inside VarHandles) are really just a bunch of MH adapters that are stitched 
>>> together into a brand new VH. The only caveat is that, we could have a 
>>> checked exception mismatch: the VarHandle API methods are specified not to 
>>> throw any checked exception, whereas method handles can throw any 
>>> throwable. This means that, potentially, calling get() on an adapted 
>>> VarHandle could result in a checked exception being thrown; to solve this 
>>> gnarly issue, we decided to scan all the filter functions passed to the VH 
>>> combinators and look for direct method handles which throw checked 
>>> exceptions. If such MHs are found (these can be deeply nested, since the 
>>> MHs can be adapted on their own), adaptation of the target VH fails fast.
>>> 
>>> 
>>> * More ByteBuffer implementation changes
>>> 
>>> Some more changes to ByteBuffer support were necessary here. First, we have 
>>> added support for retrieval of "mapped" properties associated with a 
>>> ByteBuffer (e.g. the file descriptor, etc.). This is crucial if we want to 
>>> be able to turn an existing byte buffer into the "right kind" of memory 
>>> segment.
>>> 
>>> Conversely, we also have to allow creation of mapped byte buffers given 
>>> existing parameters - which is needed when going from (mapped) segment to a 
>>> buffer. These two pieces together allow us to go from segment to buffer and 
>>> back w/o losing any information about the underlying memory mapping (which 
>>> was an issue in the previous implementation).
>>> 
>>> Lastly, to support the new MappedMemorySegment abstraction, all the memory 
>>> mapped supporting functionalities have been moved into a common helper 
>>> class so that MappedMemorySegmentImpl can reuse that (e.g. for 
>>> MappedMemorySegment::force).
>>> 
>>> * Rewritten memory segment hierarchy
>>> 
>>> The old implementation had a monomorphic memory segment class. In this 
>>> round we aimed at splitting the various implementation classes so that we 
>>> have a class for heap segments (HeapMemorySegmentImpl), one for native 
>>> segments (NativeMemorySegmentImpl) and one for memory mapped segments 
>>> (MappedMemorySegmentImpl, which extends from NativeMemorySegmentImpl). Not 
>>> much to see here - although one important point is that, by doing this, we 
>>> have been able to speed up performances quite a bit, since now e.g. 
>>> native/mapped segments are _guaranteed_ to have a null "base". We have also 
>>> done few tricks to make sure that the "base" accessor for heap segment is 
>>> sharply typed and also NPE checked, which allows C2 to speculate more and 
>>> hoist. With these changes _all_ segment types have comparable performances 
>>> and hoisting guarantees (unlike in the old implementation).
>>> 
>>> * Add workarounds in MemoryAddressProxy, AbstractMemorySegmentImpl to 
>>> special case "small segments" so that VM can apply bound check elimination
>>> 
>>> This is another important piece which allows to get very good performances 
>>> out of indexes memory access var handles; as you might know, the JIT 
>>> compiler has troubles in optimizing loops where the loop variable is a long 
>>> [2]. To make up for that, in this round we add an optimization which allows 
>>> the API to detect whether a segment is *small* or *large*. For small 
>>> segments, the API realizes that there's no need to perform long computation 
>>> (e.g. to perform bound checks, or offset additions), so it falls back to 
>>> integer logic, which in turns allows bound check elimination.
>>> 
>>> * renaming of the various var handle classes to conform to "memory access 
>>> var handle" terminology
>>> 
>>> This is mostly stylistic, nothing to see here.
>>> 
>>> Tests changes
>>> =============
>>> 
>>> In addition to the tests for the new API changes, we've also added some 
>>> stress tests for var handle combinators - e.g. there's a flag that can be 
>>> enabled which turns on some "dummy" var handle adaptations on all var 
>>> handles created by the runtime. We've used this flag on existing tests to 
>>> make sure that things work as expected.
>>> 
>>> To sanity test the new memory segment spliterator, we have wired the new 
>>> segment spliterator with the existing spliterator test harness.
>>> 
>>> We have also added several micro benchmarks for the memory segment API (and 
>>> made some changes to the build script so that native libraries would be 
>>> handled correctly).
>>> 
>>> 
>>> [1] - 
>>> https://docs.oracle.com/en/java/javase/14/docs/specs/jni/functions.html#newdirectbytebuffer
>>> [2] - https://bugs.openjdk.java.net/browse/JDK-8223051
>>> [3] - https://openjdk.java.net/jeps/383
>>> [4] - 
>>> https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html#numpy.reshape
>>> 
>>> 
>
Re: RFR 8243491: Implementation of Foreign-Memory Access API (Second Incubator)

Reply via email to