[
https://issues.apache.org/jira/browse/SPARK-44272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mridul Muralidharan resolved SPARK-44272.
-----------------------------------------
Fix Version/s: 3.5.0
4.0.0
Resolution: Fixed
Issue resolved by pull request 41821
[https://github.com/apache/spark/pull/41821]
> Path Inconsistency when Operating statCache within Yarn Client
> --------------------------------------------------------------
>
> Key: SPARK-44272
> URL: https://issues.apache.org/jira/browse/SPARK-44272
> Project: Spark
> Issue Type: Bug
> Components: Spark Submit
> Affects Versions: 0.9.1, 2.3.0, 3.4.0, 3.5.0
> Reporter: SHU WANG
> Assignee: SHU WANG
> Priority: Critical
> Fix For: 3.5.0, 4.0.0
>
>
> The *addResource* from *ClientDistributedCacheManager* can add *FileStatus*
> to
> *statCache* when it is not yet cached. Also, there is a subtle bug from
> *isPublic* from
> *getVisibility* method. *uri.getPath()* will not retain URI information like
> scheme, host, etc. So, the *uri* passed to checkPermissionOfOther will differ
> from the original {*}uri{*}.
> For example, if uri is "file:/foo.invalid.com:8080/tmp/testing", then
> {code:java}
> uri.getPath -> /foo.invalid.com:8080/tmp/testing
> uri.toString -> file:/foo.invalid.com:8080/tmp/testing{code}
> The consequence of this bug is that we will *double RPC calls* when the
> resources are remote, which is unnecessary. We see nontrivial overhead when
> checking those resources from our HDFS, especially when HDFS is overloaded.
>
> Ref: related code within *ClientDistributedCacheManager*
> {code:java}
> def addResource(...) {
> val destStatus = statCache.getOrElse(destPath.toUri(),
> fs.getFileStatus(destPath))
> val visibility = getVisibility(conf, destPath.toUri(), statCache)
> }
> private[yarn] def getVisibility() {
> isPublic(conf, uri, statCache)
> }
> private def isPublic(conf: Configuration, uri: URI, statCache: Map[URI,
> FileStatus]): Boolean = {
> val current = new Path(uri.getPath()) // Should not use getPath
> checkPermissionOfOther(fs, uri, FsAction.READ, statCache)
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]