kbendick commented on pull request #2577: URL: https://github.com/apache/iceberg/pull/2577#issuecomment-840861832
Also, if we introduced this, would we be removing the possibility of all of the different locality levels that are provided by Yarn? https://spark.apache.org/docs/latest/tuning.html#data-locality From the scheduling configs, you can see that the options provided by spark are much more complex: https://spark.apache.org/docs/latest/configuration.html#scheduling spark.locality.wait.node spark.locality.wait.process spark.locality.wait.rack I think that the 30s upper limit you are seeing is potentially being derived from `spark.scheduler.maxRegisteredResourcesWaitingTime`, which is by default 30s. The docs for that state: ``` Maximum amount of time to wait for resources to register before scheduling begins. ``` So I'm guessing that is where the 30s thing came into play (before spark started scheduling tasks). Can you try setting spark.locallity.wait.rack or .node and see if that helps? I've only tried this on S3, so I'm not 100% sure if this will help you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
