Fenil-v commented on PR #904:
URL: https://github.com/apache/arrow-java/pull/904#issuecomment-3545070425

   > > > @V-Fenil Hi, thanks for your interest. I already built .so(for linux), 
.dylib(for mac). But I don't have windows env, so I can't provide .dll for you. 
To verify this pr, you'll need to build from source, see 
https://github.com/apache/arrow-java?tab=readme-ov-file#building-from-source. 
That's also what I did to verify my pr.
   > > 
   > > 
   > > Hi @luoyuxia I'm testing your PR on linux. Could you share the built 
libarrow_dataset_jni.so file? I can build java but need the native library. 
(more specific my build was success but I can't find .so file)
   > > Total build time was 49 mins And Arrow Java C Data Interface & Arrow 
Java Dataset was only 45 sec each!! So there was no C++ compilation I guess, if 
would be better if you share direct file.
   > 
   > Of course I can share it. I can share you the `libarrow_dataset_jni.so` as 
well as the jar built with `libarrow_dataset_jni.so`. How can I share it? Send 
it to you email or by other way?
   
   I'm testing PR #904 (native Parquet writer via JNI) on Ubuntu 22.04/WSL2 
with Java 11 
   and hitting a consistent failure during ParquetWriter initialization.
   
   Setup:
   - Downloaded jni-linux-x86_64 artifacts from CI build (run #19222857860)
   - Using Arrow Java 19.0.0-SNAPSHOT with both libarrow_dataset_jni.so and 
     libarrow_cdata_jni.so loaded
   - All library dependencies resolved (ldd shows no missing libraries)
   
   Error:
   The ParquetWriter constructor fails at line 71 with a memory leak error 
during cleanup:
   
   
   java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: 
(128)
   Allocator(ROOT) 0/128/4998/9223372036854775807 (res/actual/peak/limit)
   at org.apache.arrow.dataset.file.ParquetWriter.close(ParquetWriter.java:158)
   at org.apache.arrow.dataset.file.ParquetWriter.<init>(ParquetWriter.java:71)
   
   
   Analysis:
   Looking at the bytecode, the constructor creates a RootAllocator, then calls 
either 
   `ArrowSchema.allocateNew()` (line 24) or `Data.exportSchema()` (line 37), 
which throws 
   an exception. The constructor's cleanup calls close(), which detects the 
128-byte leak 
   from the allocator created at line 14.
   
   Questions:
   1. Are there additional native libraries or system dependencies required 
beyond 
      libarrow_dataset_jni.so and libarrow_cdata_jni.so?
   2. Is the CI build fully functional, or does it require Arrow C++ runtime 
libraries 
      to be installed separately?
   3. What's the expected initialization sequence for ParquetWriter with these 
JNI 
      libraries?
   
   The Java code is simply:
   java
   FileOutputStream fos = new FileOutputStream(outputPath);
   ParquetWriter writer = new ParquetWriter(fos, schema);
   
   
   Any guidance would be appreciated. Thanks for this PR - looking forward to 
using 
   the native performance!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to