davisusanibar commented on code in PR #13973:
URL: https://github.com/apache/arrow/pull/13973#discussion_r964828671
##########
java/dataset/src/test/java/org/apache/arrow/dataset/file/TestFileSystemDataset.java:
##########
@@ -357,6 +364,41 @@ public void testBaseArrowIpcRead() throws Exception {
AutoCloseables.close(factory);
}
+ @Test
+ public void testBaseOrcRead() throws Exception {
+ String dataName = "test-orc";
+ String basePath = TMP.getRoot().getAbsolutePath();
+
+ TypeDescription orcSchema = TypeDescription.fromString("struct<ints:int>");
+ Writer writer = OrcFile.createWriter(new Path(basePath, dataName),
+ OrcFile.writerOptions(new Configuration()).setSchema(orcSchema));
+ VectorizedRowBatch batch = orcSchema.createRowBatch();
+ LongColumnVector longColumnVector = (LongColumnVector) batch.cols[0];
+ longColumnVector.vector[0] = Integer.MIN_VALUE;
+ longColumnVector.vector[1] = Integer.MAX_VALUE;
+ batch.size = 2;
+ writer.addRowBatch(batch);
+ writer.close();
Review Comment:
Could be possible to externalize this as a common method? something like
this OrcWriteSupport.writeTempFile
##########
java/dataset/src/test/java/org/apache/arrow/dataset/file/TestFileSystemDataset.java:
##########
@@ -357,6 +364,41 @@ public void testBaseArrowIpcRead() throws Exception {
AutoCloseables.close(factory);
}
+ @Test
+ public void testBaseOrcRead() throws Exception {
+ String dataName = "test-orc";
+ String basePath = TMP.getRoot().getAbsolutePath();
+
+ TypeDescription orcSchema = TypeDescription.fromString("struct<ints:int>");
+ Writer writer = OrcFile.createWriter(new Path(basePath, dataName),
+ OrcFile.writerOptions(new Configuration()).setSchema(orcSchema));
+ VectorizedRowBatch batch = orcSchema.createRowBatch();
+ LongColumnVector longColumnVector = (LongColumnVector) batch.cols[0];
+ longColumnVector.vector[0] = Integer.MIN_VALUE;
+ longColumnVector.vector[1] = Integer.MAX_VALUE;
+ batch.size = 2;
+ writer.addRowBatch(batch);
+ writer.close();
+
+ String orcDatasetUri = new File(basePath, dataName).toURI().toString();
+ FileSystemDatasetFactory factory = new
FileSystemDatasetFactory(rootAllocator(), NativeMemoryPool.getDefault(),
Review Comment:
LGTM, only have this comment:
1. There is a [Jira ticket
](https://issues.apache.org/jira/browse/ARROW-17508)in case to use
NativeMemoryPool.createListenable for big size data. Do you know if there are
some limitation/restriction for big ORC files also?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]