zanmato1984 commented on code in PR #44616:
URL: https://github.com/apache/arrow/pull/44616#discussion_r1924675483


##########
python/pyarrow/_dataset.pyx:
##########
@@ -4111,7 +4113,9 @@ class ScanNodeOptions(_ScanNodeOptions):
     **kwargs : dict, optional
         Scan options. See `Scanner.from_dataset` for possible arguments.       
 
     require_sequenced_output : bool, default False
-        Assert implicit ordering on data.
+        Batches are yielded sequentially, like single-threaded

Review Comment:
   Is this still needed?



##########
cpp/src/arrow/dataset/scanner.h:
##########
@@ -563,14 +563,17 @@ class ARROW_DS_EXPORT ScanNodeOptions : public 
acero::ExecNodeOptions {
  public:
   explicit ScanNodeOptions(std::shared_ptr<Dataset> dataset,
                            std::shared_ptr<ScanOptions> scan_options,
-                           bool require_sequenced_output = false)
+                           bool require_sequenced_output = false,
+                           bool implicit_ordering = false)
       : dataset(std::move(dataset)),
         scan_options(std::move(scan_options)),
-        require_sequenced_output(require_sequenced_output) {}
+        require_sequenced_output(require_sequenced_output),
+        implicit_ordering(implicit_ordering) {}
 
   std::shared_ptr<Dataset> dataset;
   std::shared_ptr<ScanOptions> scan_options;
   bool require_sequenced_output;
+  bool implicit_ordering;

Review Comment:
   I see there are documents of these two fields in the python counterpart. 
Could you add them in C++  too so this can be self-explaining?



##########
cpp/src/arrow/dataset/scanner.h:
##########
@@ -563,14 +563,17 @@ class ARROW_DS_EXPORT ScanNodeOptions : public 
acero::ExecNodeOptions {
  public:
   explicit ScanNodeOptions(std::shared_ptr<Dataset> dataset,
                            std::shared_ptr<ScanOptions> scan_options,
-                           bool require_sequenced_output = false)
+                           bool require_sequenced_output = false,
+                           bool implicit_ordering = false)
       : dataset(std::move(dataset)),
         scan_options(std::move(scan_options)),
-        require_sequenced_output(require_sequenced_output) {}
+        require_sequenced_output(require_sequenced_output),
+        implicit_ordering(implicit_ordering) {}
 
   std::shared_ptr<Dataset> dataset;
   std::shared_ptr<ScanOptions> scan_options;
   bool require_sequenced_output;
+  bool implicit_ordering;

Review Comment:
   IIUC, `require_sequenced_output` is handled by the scanner by collapsing the 
underlying generator to single-threaded, whereas `implicit_ordering` is 
delegated to the generated source node?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to