kbendick commented on pull request #2577:
URL: https://github.com/apache/iceberg/pull/2577#issuecomment-840861832


   Also, if we introduced this, would we be removing the possibility of all of 
the different locality levels that are provided by Yarn?
   
   https://spark.apache.org/docs/latest/tuning.html#data-locality
   
   From the scheduling configs, you can see that the options provided by spark 
are much more complex:
   https://spark.apache.org/docs/latest/configuration.html#scheduling
   
   spark.locality.wait.node
   spark.locality.wait.process
   spark.locality.wait.rack
   
   I think that the 30s upper limit you are seeing is potentially being derived 
from `spark.scheduler.maxRegisteredResourcesWaitingTime`, which is by default 
30s. The docs for that state:
   ```
   Maximum amount of time to wait for resources to register before scheduling 
begins.
   ```
   
   So I'm guessing that is where the 30s thing came into play (before spark 
started scheduling tasks).
   
   Can you try setting spark.locallity.wait.rack or .node and see if that 
helps? I've only tried this on S3, so I'm not 100% sure if this will help you.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to