Re: Fwd: JMOD, native libraries and the packaging of JavaFX

Samuel Audet Mon, 07 May 2018 19:52:40 -0700

Wow, this sounds really ambitious. A lot of this overlaps with whatGraal, LLVM, and Panama are trying to do.Which is great, but we /really/ need to come up with some sort ofroadmap and get everyone on the same page...

Anyway, here's what else is in JavaCPP today that works and provides agood start in solving a lot of these issues:


 * Builds for multiple platforms are done by CI servers, so building
   for 3+ platforms isn't an issue with services like AppVeyor and
   Travis CI: http://bytedeco.org/builds/ BTW, building everything on a
   single platform would require compatibility with MSVC, which no one,
   not even LLVM has fully achieved yet
   <https://clang.llvm.org/docs/MSVCCompatibility.html>, while nobody
   even cares about building binaries for iOS or Mac on Linux! And even
   if you do figure out a technical way to do it, you'll need to
   navigate Apple's lawyers...
 * Access to system APIs
   
<https://github.com/bytedeco/javacpp-presets/tree/master/systems#sample-usage>,
   to detect things like AVX512, among other things
   <http://bytedeco.org/news/2018/01/17/java-for-systems/>, is there.
 * Support for "fat binaries" is provided in the form of JAR files that
   can be turned easily into uber JARs with Maven: Search Central for
   "org.bytedeco.javacpp-presets opencv"
   
<http://search.maven.org/#search%7Cga%7C1%7Corg.bytedeco.javacpp-presets%20opencv>
   for an example.
 * JavaCPP itself is less than 400 KB and we can use only the parts
   that we are interested in. For example, JavaFX could bundle the
   native libraries in JAR files in a custom fashion, but rely on
   JavaCPP to load them using org.bytedeco.javacpp.Loader.load()
   
<http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/Loader.html#loadLibrary-java.net.URL:A-java.lang.String-java.lang.String...->,
   which has already been battle-tested on a lot of platforms by all
   users of JavaCPP <https://github.com/bytedeco/javacpp>, JavaCV
   <https://github.com/bytedeco/javacv>, and Deeplearning4j
   <https://github.com/deeplearning4j/deeplearning4j>, among others.

FWIW, I think Sulong has the right approach in bridging Java withGraal/Truffle and LLVM.

Now let's actually start doing something about this and get everythingstandardized, please. :)


Samuel

On 05/08/2018 12:39 AM, Cyprien Noel wrote:

That's really interesting, particularly if it enables a two-phasedeployment like Android does. Native code could be stored in a repo asLLVM bitcode, and compiled and cached at install time. It allows codememory sharing between apps, only loading code that is actually usedat runtime, long running compilation that optimises for the localplatform, and maybe also do sandboxing transformations and inline themat install time?

On Mon, May 7, 2018, 4:19 AM Mike Hearn <m...@plan99.net<mailto:m...@plan99.net>> wrote:


    I did a bit of experimentation to learn how different operating
    systems support loading shared libraries in-memory. I also did a
    bit of thinking on the topic of "native classloaders". Here's a
    braindump, which may lead nowhere but at least it'll be written down.

    Linux: This OS has the best support. The memfd_create syscall was
    added in Linux 3.17 (released in 2014). It's not exposed by glibc
    but is easy to invoke by hand. It creates a file descriptor that
    supports all normal operations from an in-memory region. After
    creation it can be fed to the rtld using
    dlopen(/proc/self/fd/num). I've tried this and it works fine.

    Windows: Runner up support. An in-memory file can be created using
    FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE passed to
    CreateFile. However, it still occupies a location in the
    namespace, probably permission checks still apply. Additionally
    the file is not truly memory only. Under memory pressure the VMM
    may flush it to disk to free up resources. I haven't tried this
    API myself.

    macOS: Worst support. There is a deprecated
    NSCreateObjectFromMemoryFile API but internally all it does is
    save the contents to a file and load it again. The core rtld
    cannot load from anything except a file and macOS does not appear
    to have any way to create in-memory fds. |shm_open|doesn't work,
    the SHM implementation is too broken; you can't write to such an
    fd, you can mmap it but then trying to open /dev/fd/x doesn't work
    on the resulting fd.

    Obviously with any such approach you face the problem of
    dependencies. Even if the DLL/DSO itself is in memory, the rtld
    will still try to load any dependent libraries from disk.

    Playing around with this led me to start pondering something more
    ambitious, namely, making native code loading customisable by Java
    code, via some sort of NativeLoader API that's conceptually
    similar to ClassLoader. Methods on it would be called to resolve a
    library given a name and to look up symbols in the returned code
    module (returning a native pointer to an entry point that uses a
    calling convention passed into the lookup). System.load() would be
    redirected to call into this class and the JNI linker would upcall
    into it. NativeLoaders could be exposed via a SPI.

    This might sound extreme, but we can see some advantages:

      * The default NativeLoader would just use the platform APIs as
        today, meaning little behaviour or code change on the JDK side.
      * Samuel could write a NativeLoader for his unpack+cache
        implementation that standardises this behaviour in an ordinary
        library anyone can use.
      * A Linux implementation could be written that gives faster
        performance and more robustness using memfd_create, again, it
        could be done by the community instead of HotSpot implementors.
      * It opens the possibility of the community developing a new,
        platform-independent native code format by itself. Why might
        this be interesting?
          o ELF, PE and Mach-O do not vary significantly in feature
            set, and differ only due to the underlying platforms
            evolving separately. From the perspective of a Java
            developer who just wants to package some native code for
            distribution the distinction is annoying and unnecessary.
            It means building the same code three times, on three
            different platforms with three different toolchains, each
            time the native code changes, even if the change itself
            has no platform specific parts.
          o A new format could potentially simplify (e.g. do we still
            need support for relocation given the relatively poor
            success of ASLR and the big success of 64 bit platforms?),
            but it could also have features Java developers might
            want, like:
              + The ability to have OS specific symbols. If your
                library differs between platforms only in a few
                places, do you really need to ship three artifacts for
                that or would one artifact that contained 3 versions
                of the underlying function be sufficient?
              + The ability to smoothly up-call to Java by linking
                dependent symbols back to Java via the callback
                capabilities in Panama.
              + The ability to internally link symbols based on
                information discovered by the JVM, e.g. if the JVM is
                able and willing to use AVX512 the new format could
                simply use that information instead of requiring the
                developer to write native code to detect CPU capabilities.
              + Fat binaries that support different CPU architectures
                inside a single file, and/or LLVM bitcode. If you have
                LLVM as a supported "architecture" then the
                NativeLoader could perhaps bridge to Sulong, so Java
                code that wasn't written with the Graal/Truffle API in
                mind can still benefit from calling into JIT-compiled
                bitcode. This probably would require NativeLoader to
                return a MethodHandle of some sort rather than a
                'long' pointer.
          o Alternatively write a cross platform ELF loader. The Java
            IO APIs provide everything needed already, as far as I know.
          o The Arabica project explored sandboxing JNI libraries
            using Google NativeClient. It appeared to work and calls
            from the sandboxed code into the OS were transparently
            redirected back into the JVM where normal permission
            checks occurred. Sandboxed, cross platform native code
            would be a powerful capability.
            http://www.cse.psu.edu/~gxt29/paper/arabica.pdf
            <http://www.cse.psu.edu/%7Egxt29/paper/arabica.pdf>
      * It opens the possibility of "assembly spinning" as an analogue
        to "bytecode spinning", as there would be no requirement that
        the NativeLoader return a pointer to code that it loaded off
        disk. Java-level JIT compilers like Graal could potentially be
        used to spin little snippets of code at runtime, or when only
        small amounts are needed (e.g. to invoke a syscall that isn't
        exposed via glibc, like, say, memfd_create) the bytes can
        simply be prepared ahead of time and read from the constant pool.

    Getting back to JavaFX for a moment, it sounds like it's too late
    for anything to go into JDK11 at this point, which is a pity. It
    will be up to the community to find a solution like Samuel's cache
    for now.

    thanks,
    -mike

Re: Fwd: JMOD, native libraries and the packaging of JavaFX

Reply via email to