Re: [PR] GH-40224: [C++] fix: improve the backpressure handling in the dataset writer [arrow]

via GitHub Mon, 01 Apr 2024 22:20:28 -0700


mapleFU commented on code in PR #40722:
URL: https://github.com/apache/arrow/pull/40722#discussion_r1547138479



##########
cpp/src/arrow/dataset/dataset_writer.cc:
##########
@@ -549,11 +566,14 @@ class DatasetWriter::DatasetWriterImpl {
               WriteAndCheckBackpressure(std::move(batch), directory, prefix);
           if (!has_room.is_finished()) {
             // We don't have to worry about sequencing backpressure here since
-            // task_group_ serves as our sequencer.  If batches continue to 
arrive after
-            // we pause they will queue up in task_group_ until we free up and 
call
-            // Resume
+            // task_group_ serves as our sequencer.  If batches continue to 
arrive
+            // after we pause they will queue up in task_group_ until we free 
up and
+            // call Resume
             pause_callback_();
-            return has_room.Then([this] { resume_callback_(); });
+            paused_ = true;
+            return has_room.Then([this] { ResumeIfNeeded(); });
+          } else {
+            ResumeIfNeeded();
           }

Review Comment:
   Usally mutex is used with mutable. But this LGTM either



##########
cpp/src/arrow/util/async_util.h:
##########
@@ -277,6 +278,8 @@ class ARROW_EXPORT ThrottledAsyncTaskScheduler : public 
AsyncTaskScheduler {
   /// Allows task to be submitted again.  If there is a max_concurrent_cost 
limit then
   /// it will still apply.
   virtual void Resume() = 0;
+  /// Return the number of tasks queued but not yet submitted
+  virtual std::size_t QueueSize() = 0;

Review Comment:
   Usally mutex is used with mutable. But this LGTM either, I don't have strong 
preference too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-40224: [C++] fix: improve the backpressure handling in the dataset writer [arrow]

Reply via email to