This is an automated email from the ASF dual-hosted git repository.
findepi pushed a commit to branch 1.6.x
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/1.6.x by this push:
new ed53c6d326 Drop ParallelIterable's queue low water mark (#10979)
ed53c6d326 is described below
commit ed53c6d326cb7efef2d41e26ef001e1d7b17fd78
Author: Piotr Findeisen <[email protected]>
AuthorDate: Thu Aug 22 00:05:17 2024 +0200
Drop ParallelIterable's queue low water mark (#10979)
As part of the change in commit
7831a8dfc3a2de546ca069f4fc1e7afd03777554, queue low water mark was
introduced. However, it resulted in increased number of manifests being
read when planning LIMIT queries in Trino Iceberg connector. To avoid
increased I/O, back out the change for now.
---
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java
b/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java
index 6486bd7fd4..17c45b6348 100644
--- a/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java
+++ b/core/src/main/java/org/apache/iceberg/util/ParallelIterable.java
@@ -192,12 +192,9 @@ public class ParallelIterable<T> extends CloseableGroup
implements CloseableIter
// If the consumer is processing records more slowly than the producers,
the producers will
// eventually fill the queue and yield, returning continuations.
Continuations and new tasks
// are started by checkTasks(). The check here prevents us from
restarting continuations or
- // starting new tasks too early (when queue is almost full) or too late
(when queue is already
- // emptied). Restarting too early would lead to tasks yielding very
quickly (CPU waste on
- // scheduling). Restarting too late would mean the consumer may need to
wait for the tasks
- // to produce new items. A consumer slower than producers shouldn't need
to wait.
- int queueLowWaterMark = maxQueueSize / 2;
- if (queue.size() > queueLowWaterMark) {
+ // starting new tasks before the queue is emptied. Restarting too early
would lead to tasks
+ // yielding very quickly (CPU waste on scheduling).
+ if (!queue.isEmpty()) {
return true;
}