Wow, this sounds really ambitious. A lot of this overlaps with what
Graal, LLVM, and Panama are trying to do.
Which is great, but we /really/ need to come up with some sort of
roadmap and get everyone on the same page...
Anyway, here's what else is in JavaCPP today that works and provides a
good start in solving a lot of these issues:
* Builds for multiple platforms are done by CI servers, so building
for 3+ platforms isn't an issue with services like AppVeyor and
Travis CI: http://bytedeco.org/builds/ BTW, building everything on a
single platform would require compatibility with MSVC, which no one,
not even LLVM has fully achieved yet
<https://clang.llvm.org/docs/MSVCCompatibility.html>, while nobody
even cares about building binaries for iOS or Mac on Linux! And even
if you do figure out a technical way to do it, you'll need to
navigate Apple's lawyers...
* Access to system APIs
<https://github.com/bytedeco/javacpp-presets/tree/master/systems#sample-usage>,
to detect things like AVX512, among other things
<http://bytedeco.org/news/2018/01/17/java-for-systems/>, is there.
* Support for "fat binaries" is provided in the form of JAR files that
can be turned easily into uber JARs with Maven: Search Central for
"org.bytedeco.javacpp-presets opencv"
<http://search.maven.org/#search%7Cga%7C1%7Corg.bytedeco.javacpp-presets%20opencv>
for an example.
* JavaCPP itself is less than 400 KB and we can use only the parts
that we are interested in. For example, JavaFX could bundle the
native libraries in JAR files in a custom fashion, but rely on
JavaCPP to load them using org.bytedeco.javacpp.Loader.load()
<http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/Loader.html#loadLibrary-java.net.URL:A-java.lang.String-java.lang.String...->,
which has already been battle-tested on a lot of platforms by all
users of JavaCPP <https://github.com/bytedeco/javacpp>, JavaCV
<https://github.com/bytedeco/javacv>, and Deeplearning4j
<https://github.com/deeplearning4j/deeplearning4j>, among others.
FWIW, I think Sulong has the right approach in bridging Java with
Graal/Truffle and LLVM.
Now let's actually start doing something about this and get everything
standardized, please. :)
Samuel
On 05/08/2018 12:39 AM, Cyprien Noel wrote:
That's really interesting, particularly if it enables a two-phase
deployment like Android does. Native code could be stored in a repo as
LLVM bitcode, and compiled and cached at install time. It allows code
memory sharing between apps, only loading code that is actually used
at runtime, long running compilation that optimises for the local
platform, and maybe also do sandboxing transformations and inline them
at install time?
On Mon, May 7, 2018, 4:19 AM Mike Hearn <m...@plan99.net
<mailto:m...@plan99.net>> wrote:
I did a bit of experimentation to learn how different operating
systems support loading shared libraries in-memory. I also did a
bit of thinking on the topic of "native classloaders". Here's a
braindump, which may lead nowhere but at least it'll be written down.
Linux: This OS has the best support. The memfd_create syscall was
added in Linux 3.17 (released in 2014). It's not exposed by glibc
but is easy to invoke by hand. It creates a file descriptor that
supports all normal operations from an in-memory region. After
creation it can be fed to the rtld using
dlopen(/proc/self/fd/num). I've tried this and it works fine.
Windows: Runner up support. An in-memory file can be created using
FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE passed to
CreateFile. However, it still occupies a location in the
namespace, probably permission checks still apply. Additionally
the file is not truly memory only. Under memory pressure the VMM
may flush it to disk to free up resources. I haven't tried this
API myself.
macOS: Worst support. There is a deprecated
NSCreateObjectFromMemoryFile API but internally all it does is
save the contents to a file and load it again. The core rtld
cannot load from anything except a file and macOS does not appear
to have any way to create in-memory fds. |shm_open|doesn't work,
the SHM implementation is too broken; you can't write to such an
fd, you can mmap it but then trying to open /dev/fd/x doesn't work
on the resulting fd.
Obviously with any such approach you face the problem of
dependencies. Even if the DLL/DSO itself is in memory, the rtld
will still try to load any dependent libraries from disk.
Playing around with this led me to start pondering something more
ambitious, namely, making native code loading customisable by Java
code, via some sort of NativeLoader API that's conceptually
similar to ClassLoader. Methods on it would be called to resolve a
library given a name and to look up symbols in the returned code
module (returning a native pointer to an entry point that uses a
calling convention passed into the lookup). System.load() would be
redirected to call into this class and the JNI linker would upcall
into it. NativeLoaders could be exposed via a SPI.
This might sound extreme, but we can see some advantages:
* The default NativeLoader would just use the platform APIs as
today, meaning little behaviour or code change on the JDK side.
* Samuel could write a NativeLoader for his unpack+cache
implementation that standardises this behaviour in an ordinary
library anyone can use.
* A Linux implementation could be written that gives faster
performance and more robustness using memfd_create, again, it
could be done by the community instead of HotSpot implementors.
* It opens the possibility of the community developing a new,
platform-independent native code format by itself. Why might
this be interesting?
o ELF, PE and Mach-O do not vary significantly in feature
set, and differ only due to the underlying platforms
evolving separately. From the perspective of a Java
developer who just wants to package some native code for
distribution the distinction is annoying and unnecessary.
It means building the same code three times, on three
different platforms with three different toolchains, each
time the native code changes, even if the change itself
has no platform specific parts.
o A new format could potentially simplify (e.g. do we still
need support for relocation given the relatively poor
success of ASLR and the big success of 64 bit platforms?),
but it could also have features Java developers might
want, like:
+ The ability to have OS specific symbols. If your
library differs between platforms only in a few
places, do you really need to ship three artifacts for
that or would one artifact that contained 3 versions
of the underlying function be sufficient?
+ The ability to smoothly up-call to Java by linking
dependent symbols back to Java via the callback
capabilities in Panama.
+ The ability to internally link symbols based on
information discovered by the JVM, e.g. if the JVM is
able and willing to use AVX512 the new format could
simply use that information instead of requiring the
developer to write native code to detect CPU capabilities.
+ Fat binaries that support different CPU architectures
inside a single file, and/or LLVM bitcode. If you have
LLVM as a supported "architecture" then the
NativeLoader could perhaps bridge to Sulong, so Java
code that wasn't written with the Graal/Truffle API in
mind can still benefit from calling into JIT-compiled
bitcode. This probably would require NativeLoader to
return a MethodHandle of some sort rather than a
'long' pointer.
o Alternatively write a cross platform ELF loader. The Java
IO APIs provide everything needed already, as far as I know.
o The Arabica project explored sandboxing JNI libraries
using Google NativeClient. It appeared to work and calls
from the sandboxed code into the OS were transparently
redirected back into the JVM where normal permission
checks occurred. Sandboxed, cross platform native code
would be a powerful capability.
http://www.cse.psu.edu/~gxt29/paper/arabica.pdf
<http://www.cse.psu.edu/%7Egxt29/paper/arabica.pdf>
* It opens the possibility of "assembly spinning" as an analogue
to "bytecode spinning", as there would be no requirement that
the NativeLoader return a pointer to code that it loaded off
disk. Java-level JIT compilers like Graal could potentially be
used to spin little snippets of code at runtime, or when only
small amounts are needed (e.g. to invoke a syscall that isn't
exposed via glibc, like, say, memfd_create) the bytes can
simply be prepared ahead of time and read from the constant pool.
Getting back to JavaFX for a moment, it sounds like it's too late
for anything to go into JDK11 at this point, which is a pity. It
will be up to the community to find a solution like Samuel's cache
for now.
thanks,
-mike