Sorry for writing so much but you asked for it.. :-)

On 28.04.2011 20:08, duncan wrote:
> We are accessing native libraries from Java. This works outside of
> OSGi and in OSGi on Windows but not on Linux probably because the dll
> and so linking behave differently.

Yes and no. They work differently, but you must understand why.
Instead of randomly "successful" expriments here is what happens & why,
and what you can do to load native code from bundles regardless of platform.

First off: the Bundle-NativeCode capabilitiy/idea is absolutely
fantastic and one of the most underrated & unknown features of OSGi. It
works well, but is naturally affected by the capabilities of the
underlying OS and Java's behaviour. Naturally this area is harder since
"write once, run anywhere" does not apply any longer when you venture
outside the JVM. Nobody just walks into native land.

I'll skip the case where you only have a single library, since that is
easy and not a problem on either platform. Just link all your JNI stubs
and whatever they call into one big dll/so and bingo; works everywhere.

As you have learned this is not so easy any more when you have two or
more libraries with dependencies between them. Since loading of
dependencies between native libraries works differently on different
OSes, there is nothing OSGi itself can do here; but understanding what
exactly happens will allow us to handle the differences in a transparent
manner.

First off let's understand what happens when Java loads a dll/so.

Regardless of the platform-specific call the library is loaded, and -
before the call returns - the OS will notice missing symbols (e.g.
functions defined in a dependent library) and start the
platform-dependent search for dependencies to satisfies the missing bits.
Windows will look in the current working directory of the process
(usually useless for OSGi) and then %PATH%, Linux will inspect
LD_LIBRARY_PATH ([1]) and then consult the ld.so cache, usually located
in /etc/ld.so.cache. This cache contains a cached representation of the
system-wide search paths usually declared in /etc/ld.so.conf. This can
be updated by e.g. an installer or system management tool; see man ldconfig.

All of these mechanisms are unfortunately not too useful for OSGi, since
two or more libraries contained in & unpacked from a bundle are sitting
somewhere in a bundle cache - a private directory normally not visible
to the OS. So what can you do to trick the native platform loading into
doing the right thing?

Let's start with Windows since that is ironically the easiest case. As
some other posters have indicated you can indeed pre-load dependencies
in their correct order and then load your JNI library. The preloading
makes sure that the last loaded dependency has no missing symbols, and
the native search machinery does not kick in - problem averted.

Then you try this on Linux or OSX and realize that it Simply Does Not
Work (sorry Richard :). Why?

The POSIX/Unix/Linux system calls used for loading, accessing and
closing shared objects are called dlopen(), dlsym() and dlclose(). If
you look at dlopen(), it takes a filename (the library to load) and a
magic flag. The way the JVM calls dlopen() is the reason why preloading
does not work: the flag is by platform-default set to RTLD_LOCAL, which
does NOT expose loaded symbols to subsequently loaded libraries
(contrary to Windows, where loaded symbols become visible).
In other words: manual preloading of dependencies on Linux fixes exactly
nothing! Whether this is just an unfortunate default, a bug or
deliberate is a different, complicated discussion about isolation,
security, stability etc. but ultimately does not matter since we cannot
influence it.

Bummer. So what now, give up? Bah!

If preloading does not work then maybe we can make the OS do the right
thing instead. ld.so - the dynamic loader used for resolving library
dependencies - looks at a loaded object and obviously also needs to know
what else to load in case there are any declared dependencies. As it
turns out shared libraries can have a search path embedded and *this
path can be manipulated*. Quote from the ld.so man page:

   $ORIGIN and rpath
       ld.so  understands  the  string  $ORIGIN  (or  equivalently
       ${ORIGIN}) in an rpath specification (DT_RPATH or DT_RUNPATH) to
       mean the directory containing the application executable.
       Thus, an application located in somedir/app could be compiled
       with gcc -Wl,-rpath,'$ORIGIN/../lib' so that it finds an
       associated shared library in somedir/lib no matter where
       somedir is located in the directory hierarchy. [..]

Awesomesauce! This means that you can build your native libraries with
e.g. -rpath,'$ORIGIN' and loading one dependency (for example your JNI
stubs) will now correctly find any dependencies from the same directory.
Win!

But what if you don't have the source to the dependencies? Or even worse
you never learned how to compile C code?

Well, old people^h^h^h^h^h^h^h enterprising individuals like me would
probably just bring out the old hex editor, but there is a more
civilized way: meet patchelf [2]. For anyone who wants to dive deeper
you can also look at [3]; I managed to make the author's "rpath" tool
build & work on Linux with just a few modifications.

patchelf is easy to build and allows you to inspect & modify the runpath
of any library. For an example let's find a library that has dependencies:

(readelf is part of binutils, which you should have installed already)

$readelf -d /usr/lib/libpcrecpp.so.0.0.0

Dynamic section at offset 0x8dd4 contains 28 entries:
  Tag        Type                  Name/Value
 0x00000001 (NEEDED)              Shared library: [libpcre.so.0]
 0x00000001 (NEEDED)              Shared library:[libstdc++.so.6]
 0x00000001 (NEEDED)              Shared library: [libm.so.6]
 0x00000001 (NEEDED)              Shared library: [libc.so.6]
 0x00000001 (NEEDED)              Shared library: [libgcc_s.so.1]
 0x0000000e (SONAME)              Library soname: [libpcrecpp.so.0]
 0x0000000c (INIT)                0x2af8

That should do: the C++ wrapper for pcre needs the core C pcre library
as dependency. Let's modify it!

$cp /usr/lib/libpcrecpp.so.0.0.0 .
$patchelf --set-rpath '.:$ORIGIN' libpcrecpp.so.0.0.0
$patchelf --print-rpath libpcrecpp.so.0.0.0
.:$ORIGIN

And indeed:

holger>readelf -d libpcrecpp.so.0.0.0

Dynamic section at offset 0xa100 contains 29 entries:
  Tag        Type                  Name/Value
 0x0000001d (RUNPATH)             Library runpath: [.:$ORIGIN]
 0x00000001 (NEEDED)              Shared library: [libpcre.so.0]
[..]

This means that loading this library would make the ld.so loader look
for dependencies first in the current directory of the process (which
btw is a questionable practice, mostly for security reasons), and then
in the directory where the originating library is located..which could
be the completely unknown and anonymous bundle cache of the OSGi
runtime. Win!

So you modify all your .so libraries, package them into the bundle
and..OH NOES! It STILL does not work! Depending on the OSGi runtime you
still get errors about the native dependency not being found, despite
$ORIGIN.

"You didn't think it would be that easy, did you? Silly Rabbit."

You see, what happens is that the OSGi runtime is at liberty to unpack
resources from a bundle lazily. This means that simply loading a
toplevel dependency is not enough, as any other bundle-included
dependencies may not yet have been unpacked, causing the native loader
to fail. The fix is easy: simply *pretend* to preload all libraries
(just like on Windows), but ignore any UnsatisfiedLinkErrors - and then
load the JNI stubs.

Et voilà: angels will sing, and your bundle will work.

To make this robust don't just sprinkle the System.loadLibrary() calls
throughout your classes; make a central Initializer class (bound to the
bundle's Activator or not..your choice) and refer to it. Then in that
activator you can have differetn loading strategies for Windows
(preloading nicely in-order), Linux (fake preloading) OSX (same as
Linux) or any other platform. It's also a good idea not to refer to the
classes in this bundle as "library" (i.e. passive code); make the native
code a service and properly track it from client bundles.

I hope this answers all your questions and gives you more rope to hang
yourself than you asked for. If not..just ask. :-)

Holger

[1] http://blogs.sun.com/rie/entr /tt_ld_library_path_tt
[2] http://nixos.org/patchelf.html
[3] http://blogs.sun.com/ali/entry/changing_elf_runpaths

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@felix.apache.org
For additional commands, e-mail: users-h...@felix.apache.org

Reply via email to