[
https://issues.apache.org/jira/browse/SPARK-53478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-53478:
-----------------------------------
Labels: pull-request-available (was: )
> Inconsistent file resolution between SparkContext.addFile and SparkFiles.get
> in local mode due to job-specific artifact directory
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-53478
> URL: https://issues.apache.org/jira/browse/SPARK-53478
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 4.0.0
> Reporter: Ben Burnett
> Priority: Minor
> Labels: pull-request-available
>
> When I use `SparkContext.addFile` in spark local mode, it adds it to the cwd.
> When I try to access that file inside of a sql planned RDD operation with
> `SparkFiles.get`, it tries to resolve the path against the job-specific
> artifact directory (ie \{cwd}/userFiles-\{some_uuid}) and fails to find the
> actual file.
> It seems like this is because the sql planning is setting up the active job
> artifact state based on the session that the sql is being called on, while
> directly accessing the SparkContext from a script in spark local doesn't.
> This works in all other configurations of spark so obviously not a huge
> priority. We only run `addFile` in spark local for testing but it'd be nice
> to support it since it is a regression.
> I think the origin is
> [https://github.com/apache/spark/commit/26330355836f5b2dad9b7bd4c72d9830c7ce6788]
> since that's where it changed `SparkFiles.get` but not entirely sure
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]