westonpace commented on a change in pull request #9947:
URL: https://github.com/apache/arrow/pull/9947#discussion_r610821714



##########
File path: cpp/src/arrow/dataset/scanner.cc
##########
@@ -67,7 +68,105 @@ Result<RecordBatchGenerator> ScanTask::ExecuteAsync() {
 
 bool ScanTask::supports_async() const { return false; }
 
-Result<FragmentIterator> Scanner::GetFragments() {
+Result<ScanTaskIterator> Scanner::Scan() {
+  // TODO(ARROW-12289) This is overridden in SyncScanner and will never be 
implemented in
+  // AsyncScanner.  It is deprecated and will eventually go away.
+  return Status::NotImplemented("This scanner does not support the legacy 
Scan() method");
+}
+
+Result<EnumeratedRecordBatchIterator> Scanner::ScanBatchesUnordered() {
+  // If a scanner doesn't support unordered scanning (i.e. SyncScanner) then 
we just
+  // fall back to an ordered scan and assign the appropriate tagging
+  ARROW_ASSIGN_OR_RAISE(auto ordered_scan, ScanBatches());
+  return AddPositioningToInOrderScan(std::move(ordered_scan));
+}
+
+Result<EnumeratedRecordBatchIterator> Scanner::AddPositioningToInOrderScan(
+    TaggedRecordBatchIterator scan) {
+  ARROW_ASSIGN_OR_RAISE(auto first, scan.Next());
+  if (IsIterationEnd(first)) {
+    return MakeEmptyIterator<EnumeratedRecordBatch>();
+  }
+  struct State {
+    State(TaggedRecordBatchIterator source, TaggedRecordBatch first)
+        : source(std::move(source)),
+          batch_index(1),
+          finished(false),
+          prev_batch(TaggedRecordBatch{nullptr, nullptr}) {}
+    TaggedRecordBatchIterator source;
+    int batch_index;
+    int fragment_index;
+    bool finished;
+    TaggedRecordBatch prev_batch;
+  };
+  struct TaggingIterator {

Review comment:
       Already done for generators (https://github.com/apache/arrow/pull/9945). 
 It's a bit trickier here with the iterator here since it's enumerating both 
fragments and record batches at the same time.  In the async version we 
enumerate them separately so it made more sense to have a general utility.  
Although this comment made me realize this should be named EnumeratingIterator 
and not TaggingIterator (I'm using "tagging" to refer to attaching the fragment 
to the record batch which is not what I'm doing here).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to