[GitHub] [iceberg] kbendick commented on pull request #2577: Spark: Add read.locality.enabled to TableProperties to support disabl…

GitBox Thu, 13 May 2021 14:53:53 -0700


kbendick commented on pull request #2577:
URL: https://github.com/apache/iceberg/pull/2577#issuecomment-840856930



   > I think that this can be configured via `spark.locality.wait`. I think if 
you set it to zero, it will just automatically give up looking for a data local 
node. At least that's what I've done when reading from S3 with yarn (which is 
by definition not local).
   > 
   > ```
   > Number of milliseconds to wait to launch a data-local task before giving 
up and launching it on a less-local node.
   > The same wait will be used to step through multiple locality levels 
(process-local, node-local, rack-local and then any).
   > It is also possible to customize the waiting time for each level by 
setting spark.locality.wait.node, etc.
   > You should increase this setting if your tasks are long and see poor 
locality, but the default usually works well.
   > ```
   
   Given that you say it takes 30seconds, that would align with the default 
value of `3000` (appears to be in milliseconds).
   
   If it's possible to leave this as a spark property, maybe it's not something 
we really need defined on the table level? I'm open to discuss on that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on pull request #2577: Spark: Add read.locality.enabled to TableProperties to support disabl…

Reply via email to