[ 
https://issues.apache.org/jira/browse/SPARK-55097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-55097:
-----------------------------------
    Labels: pull-request-available  (was: )

> Re-adding cached local relations using  ref-counting drops blocks silently
> --------------------------------------------------------------------------
>
>                 Key: SPARK-55097
>                 URL: https://issues.apache.org/jira/browse/SPARK-55097
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect, Spark Core
>    Affects Versions: 4.1.0, 4.1.1
>            Reporter: Pranav Dev
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> After the introduction of the ref-counting logic for cloning sessions 
> [[link]|https://github.com/apache/spark/pull/52651], whenever an identical 
> cached artifact (same session, same hash) is re-added, it incorrectly leds to 
> deletion of the existing block.
> Verified this bug locally using:
> {code:java}
> test("re-adding the same cache artifact should not remove the block") {
>     val blockManager = spark.sparkContext.env.blockManager
>     val remotePath = Paths.get("cache/duplicate_hash")
>     val blockId = CacheId(spark.sessionUUID, "duplicate_hash")
>     try {
>       // First addition
>       withTempPath { path =>
>         Files.write(path.toPath, "test".getBytes(StandardCharsets.UTF_8))
>         artifactManager.addArtifact(remotePath, path.toPath, None)
>       }
>       assert(blockManager.getLocalBytes(blockId).isDefined)
>       blockManager.releaseLock(blockId)      // Second addition with same 
> hash - block should still exist
>       withTempPath { path =>
>         Files.write(path.toPath, "test".getBytes(StandardCharsets.UTF_8))
>         artifactManager.addArtifact(remotePath, path.toPath, None)
>       }
>       assert(blockManager.getLocalBytes(blockId).isDefined,
>         "Block should still exist after re-adding the same cache artifact")
>     } finally {
>       blockManager.releaseLock(blockId)
>       blockManager.removeCache(spark.sessionUUID)
>     }
>   } {code}
> which fails {{`assert(blockManager.getLocalBytes(blockId).isDefined`}} check 
> after the second addition with the same hash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to