danepitkin opened a new pull request, #36934:
URL: https://github.com/apache/arrow/pull/36934
### Rationale for this change
Java datasets can implicitly create an S3 filesystem, which will initialize
S3 APIs. There is currently no explicit call to shutdown S3 APIs in Java, which
results in a warning message being printed at runtime:
`arrow::fs::FinalizeS3 was not called even though S3 was initialized. This
could lead to a segmentation fault at exit`
### What changes are included in this PR?
* Add a Java runtime shutdown hook that calls `EnsureS3Finalized()` via JNI.
This is a noop if S3 is uninitialized or already finalized.
### Are these changes tested?
Yes, reproduced with:
```
import org.apache.arrow.dataset.file.FileFormat;
import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
import org.apache.arrow.dataset.jni.NativeMemoryPool;
import org.apache.arrow.dataset.source.DatasetFactory;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
public class DatasetModule {
public static void main(String[] args) {
String uri =
"s3://voltrondata-labs-datasets/nyc-taxi-tiny/year=2022/month=2/part-0.parquet";
try (
BufferAllocator allocator = new RootAllocator();
DatasetFactory datasetFactory = new
FileSystemDatasetFactory(allocator, NativeMemoryPool.getDefault(),
FileFormat.PARQUET, uri);
) {
// S3 is initialized
} catch (Exception e) {
e.printStackTrace();
}
}
}
```
I didn't think a unit test was worth adding. Let me know if you think
otherwise. Reasoning:
* We can't test the actual shutdown since thats a JVM thing.
* We could test to see if the hook is registered, but that involves exposing
the API and having access to the thread object registered with the hook. Or
using reflection to obtain it. Not worth it IMO.
* No need to test the functionality inside the hook, its just a wrapper
around a single C++ API with no params/retval.
### Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]