lidavidm commented on code in PR #14168:
URL: https://github.com/apache/arrow/pull/14168#discussion_r975396864
##########
java/dataset/src/main/java/org/apache/arrow/dataset/scanner/Scanner.java:
##########
@@ -17,19 +17,29 @@
package org.apache.arrow.dataset.scanner;
+import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.arrow.vector.types.pojo.Schema;
/**
* A high level interface for scanning data over dataset.
*/
public interface Scanner extends AutoCloseable {
+ /**
+ * Read record batches from the URI resources filtering the batch size
configured.
Review Comment:
```suggestion
* Read the dataset as a stream of record batches.
```
##########
java/dataset/src/main/java/org/apache/arrow/dataset/jni/NativeScanner.java:
##########
@@ -68,6 +68,19 @@ ArrowReader execute() {
}
@Override
+ public ArrowReader scanBatchesUnordered() {
+ if (closed) {
+ throw new NativeInstanceReleasedException();
+ }
+ if (!executed.compareAndSet(false, true)) {
+ throw new UnsupportedOperationException("NativeScanner cannot be
executed more than once. Consider creating " +
+ "new scanner instead");
Review Comment:
```suggestion
throw new UnsupportedOperationException("NativeScanner can only be
executed once. Create a " +
"new scanner instead");
```
##########
java/dataset/src/main/java/org/apache/arrow/dataset/scanner/Scanner.java:
##########
@@ -17,19 +17,29 @@
package org.apache.arrow.dataset.scanner;
+import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.arrow.vector.types.pojo.Schema;
/**
* A high level interface for scanning data over dataset.
*/
public interface Scanner extends AutoCloseable {
+ /**
+ * Read record batches from the URI resources filtering the batch size
configured.
+ *
+ * @return a {@link ArrowReader}.
+ */
+ ArrowReader scanBatchesUnordered();
Review Comment:
Sorry, it looks like it's actually `scanBatches` in the implementation:
https://github.com/apache/arrow/blob/2629f208cf8be638c7e251946947009bc181bb31/java/dataset/src/main/cpp/jni_wrapper.cc#L152
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]