kbendick commented on code in PR #4596:
URL: https://github.com/apache/iceberg/pull/4596#discussion_r904525876


##########
core/src/main/java/org/apache/iceberg/SystemProperties.java:
##########
@@ -39,11 +41,26 @@ private SystemProperties() {
    */
   public static final String SCAN_THREAD_POOL_ENABLED = 
"iceberg.scan.plan-in-worker-pool";
 
+  /**
+   * Sets the size of the queue, which is used to avoid consuming too much 
memory.
+   */
+  public static final String SCAN_SHARED_QUEUE_SIZE = 
"iceberg.scan.shared-queue-size";
+  public static final int SCAN_SHARED_QUEUE_SIZE_DEFAULT = 1000;

Review Comment:
   The worker pool is already used for a number of things though and is 
presently uncapped.
   
   Going from unbounded  to bounded is already a potential drain / possible 
source of bottleneck. I don't know what the right value is for this, but if 
we're considering this feature, I would say this is the best value to use as a 
configuration to disable it.
   
   There could be generally other scans taking place and many file scan tasks 
are unfortunately somewhat variable in size and can be combined etc. So I don't 
disagree with using the larger value (especially if it's configurable).
   
   That said, if we add this feature, we should add a property to enable or 
disable it.
   
   We can use [0, or any non-positive value, which is what sets whether or not 
the caching catalog has a 
TTL](https://github.com/apache/iceberg/blob/ae19482cd7f9ee3e9e95d7198cfc1e5068254d75/spark/v3.0/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java#L401-L409).
   
   Though catalogs general have a boolean property to enable / disable whether 
or not caching is used entirely. So either `iceberg.scan.enable-blocking-queue` 
or `iceberg.scan.use-shared-queue`, or just setting the value to 0 or -1 to 
disable it entirely



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to