davisusanibar commented on issue #36069:
URL: https://github.com/apache/arrow/issues/36069#issuecomment-1592076111
> CC @davisusanibar @lidavidm (note that this warning was newly added to the
S3 filesystem in the previous release so it is very possible the Java
implementation has never been calling finalize)
Just able to reproduce this warning with:
```java
import org.apache.arrow.dataset.file.FileFormat;
import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
import org.apache.arrow.dataset.jni.NativeMemoryPool;
import org.apache.arrow.dataset.scanner.ScanOptions;
import org.apache.arrow.dataset.scanner.Scanner;
import org.apache.arrow.dataset.source.Dataset;
import org.apache.arrow.dataset.source.DatasetFactory;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.arrow.vector.types.pojo.Schema;
public class DatasetModule {
public static void main(String[] args) {
String uri =
"s3://voltrondata-labs-datasets/nyc-taxi-tiny/year=2022/month=2/part-0.parquet";
// AWS S3
// String uri =
"hdfs://{hdfs_host}:{port}/nyc-taxi-tiny/year=2022/month=2/part-0.parquet"; //
HDFS
// String uri =
"gs://voltrondata-labs-datasets/nyc-taxi-tiny/year=2022/month=2/part-0.parquet";
// Google Cloud Storage
ScanOptions options = new ScanOptions(/*batchSize*/ 32768);
try (
BufferAllocator allocator = new RootAllocator();
DatasetFactory datasetFactory = new
FileSystemDatasetFactory(allocator, NativeMemoryPool.getDefault(),
FileFormat.PARQUET, uri);
Dataset dataset = datasetFactory.finish();
Scanner scanner = dataset.newScan(options);
ArrowReader reader = scanner.scanBatches()
) {
Schema schema = scanner.schema();
System.out.println(schema);
while (reader.loadNextBatch()) {
System.out.println("RowCount: " +
reader.getVectorSchemaRoot().getRowCount());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
```
Output messages:
```
RowCount: 2979
/Users/runner/work/crossbow/crossbow/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598:
arrow::fs::FinalizeS3 was not called even though S3 was initialized. This
could lead to a segmentation fault at exit
```
Next step:
1. Review reason of error messages
2. Add Arrow Java cookbook to cover S3 integration
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]