Sebastiaan-Alvarez-Rodriguez edited a comment on pull request #7030:
URL: https://github.com/apache/arrow/pull/7030#issuecomment-694925466


   hi there,
   
   Could you please add a few compilation/installation instructions to your 
branch?
   I just completed a long debug session to find out why the JNI runtime did 
not work.
   
   ### CPP
   Compiled `<project-root>/cpp/` project using:
   ```bash
   cd <project-root>/cpp
   cmake -DARROW_DATASET=ON -DARROW_JNI=ON -DARROW_PARQUET=ON -DARROW_IPC=ON .
   sudo make install
   ```
   
   Confirmed shared libs are indeed visible using `whereis 
libarrow_dataset_jni`.
   
   ### Java
   Compiled `<project-root>/java/dataset/` using
   ```bash
   cd <project-root>/java/dataset
   mvn clean install -Dmaven.test.skip=true # skipping tests to let build 
succeed
   ```
   
   To test whether it works, successfully ran a sample program:
   ```java
   
   public class test {
       public static void main(String[] args) {
           System.loadLibrary("arrow_dataset_jni");
           System.out.println("System loading works fine");
       }
   }
   ```
   So, my sample program's classloader can access `libarrow_dataset_jni.so` 
just fine.
   
   However, the following does not work:
   ```java
   import org.apache.arrow.memory.RootAllocator;
   import org.apache.arrow.dataset.file.FileFormat;
   import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
   
   public class test {
       private static FileSystemDatasetFactory getDatasetFactory() {
           RootAllocator allocator = new RootAllocator(Long.MAX_VALUE);
           return new FileSystemDatasetFactory(allocator, FileFormat.PARQUET, 
"/path/to/pq.parquet"); //crash
       }
   
       public static void main(String[] args) {
           System.loadLibrary("arrow_dataset_jni");
           System.out.println("Own classLoader works fine");
           FileSystemDatasetFactory test = getDatasetFactory();
           System.out.println("Not so good! I crash before I can print this");
       }
   }
   ```
   
   The relevant part of the crash is:
   ```
   Caused by: java.lang.IllegalStateException: error loading native libraries: 
java.io.FileNotFoundException: libarrow_dataset_jni.so
       at org.apache.arrow.dataset.jni.JniLoader.load(JniLoader.java:91)
       at 
org.apache.arrow.dataset.jni.JniLoader.loadRemaining(JniLoader.java:73)
       at org.apache.arrow.dataset.jni.JniLoader.ensureLoaded(JniLoader.java:60)
       at org.apache.arrow.dataset.file.JniWrapper.<init>(JniWrapper.java:34)
   ```
   
   The source code of the Dataset Java library causing the error is found 
[here](https://github.com/zhztheplayer/arrow-1/blob/fd163e199a10a7225765b0c30fbf60d8df8d20db/java/dataset/src/main/java/org/apache/arrow/dataset/jni/JniLoader.java#L78-L94).
   
   The problem occurs because 
`JniWrapper.class.getClassLoader().getResourceAsStream(libraryToLoad)` always 
returns `null`.
   Only when adding a symlink (`ln -s /usr/local/lib/libarrow_dataset_jni.so 
libarrow_dataset_jni.so`) to the package directory, it is able to find the 
correct file.
   
   Why do you create a temporary file, store the contents of the shared library 
in it, and then call `System.load(tmpfile)` and not just directly call it using 
`System.loadLibrary(libraryToLoad);`?
   
   Maybe it is a good idea to pick one of the following:
    1. Change the `'java.io.FileNotFoundException: libarrow_dataset_jni.so'` to 
display the full path (so people know where your library looks to find the 
shared object)
    2. Make a small `JNI_dataset_dev_install_note.md` somewhere on getting this 
library to work.
    3. Use `System.loadLibrary(libraryToLoad);`
    4. Use some other way to find the location of a shared library and copy it 
to a tmpfile
   
   I believe it would help many people who are not familiar with 
`getClassLoader().getResourceAsStream(...)` (like myself)
   
   
   Have a nice day,
   Sebastiaan


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to