[
https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun reassigned SPARK-43221:
-------------------------------------
Assignee: Attila Zsolt Piros (was: Qiang Yang)
> Executor obtained error information
> ------------------------------------
>
> Key: SPARK-43221
> URL: https://issues.apache.org/jira/browse/SPARK-43221
> Project: Spark
> Issue Type: Bug
> Components: Block Manager
> Affects Versions: 3.1.1, 3.2.0, 3.3.0
> Reporter: Qiang Yang
> Assignee: Attila Zsolt Piros
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: image-2023-04-21-00-19-58-021.png,
> image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png,
> image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png,
> image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Spark on Yarn Cluster
> When multiple executors exist on a node, and the same block exists on both
> executors, with some in memory and some on disk.
> Probabilistically, the executor failed to obtain the block,throw Exception:
> java.lang.ArrayIndexOutofBoundsException: 0
> at
> org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183)
>
> Next, I will replay the process of the problem occurring:
> step 1:
> The executor requests the driver to obtain block
> information(locationsAndStatusOption). The input parameters are BlockId and
> the host of its own node. Please note that it does not carry port information
> line:1092
> !image-2023-04-21-00-24-22-059.png!
> step 2:
> On the driver side, the driver obtains all blockManagers holding the block
> based on the BlockId. For non remote shuffle scenarios, the driver will
> retrieve the first one with the blockId and blockManager from the locations
> Assuming that there are two BlockManagers holding the BlockId on this node,
> BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and
> stores it in disk
> Assuming the returned status is of type memory and its disksize is 0
> line: 852, 856
> !image-2023-04-21-00-30-41-851.png!
> step 3:
> This method will return a BlockLocationsAndStatus object. If there are BMs
> using disk, the disk's path information will be stored in localDirs
> !image-2023-04-21-00-50-10-918.png!
> step 4:
> When the executor obtains locationsAndStatusOption, localDirs is not empty,
> but status.diskSize is 0
> line: 1102
> !image-2023-04-21-00-54-11-968.png!
> step 5:
> The readDiskBlockFromSameHostExecutor only determines whether the Block file
> exists, and then directly uses the incoming blocksize to read the byte array.
> If the blocksize is 0, it returns an empty byte array
> Only checked if the file exists
> line: 1234, 1240
> !image-2023-04-21-00-57-29-140.png!
> Taking values from an empty array, causing an out of bounds problem
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]