slfan1989 opened a new issue, #2032:
URL: https://github.com/apache/auron/issues/2032
### Problem
`UnifflePartitionWriter` currently uses incomplete synchronization when
calling `RssShuffleWriter` methods, which violates the upstream Uniffle's
single-threaded assumption and creates potential race conditions.
#### Current implementation issues:
**Split synchronization blocks** in `write()` method:
```
override def write(partitionId: Int, buffer: ByteBuffer): Unit = {
val bytes = new Array[Byte](buffer.limit())
buffer.get(bytes)
val bytesWritten = bytes.length
val bufferManager = rssShuffleWriter.getBufferManager
val shuffleBlockInfos = rssShuffleWriter.synchronized {
bufferManager.addPartitionData(partitionId, bytes)
}
if (shuffleBlockInfos != null && !shuffleBlockInfos.isEmpty) {
// synchronized
rssShuffleWriter.synchronized {
rssShuffleWriterPushBlocksMethod.invoke(rssShuffleWriter,
shuffleBlockInfos)
}
}
metrics.incBytesWritten(bytesWritten)
mapStatusLengths(partitionId) += bytesWritten
}
```
**Inconsistent locking** in `close()` method
```
override def close(success: Boolean): Unit = {
val start = System.currentTimeMillis()
val bufferManager = rssShuffleWriter.getBufferManager
val restBlocks = bufferManager.clear()
if (success && restBlocks != null && !restBlocks.isEmpty) {
// non-synchronized
rssShuffleWriterPushBlocksMethod.invoke(rssShuffleWriter, restBlocks)
}
val writeDurationMs = bufferManager.getWriteTime +
(System.currentTimeMillis() - start)
metrics.incWriteTime(TimeUnit.MILLISECONDS.toNanos(writeDurationMs))
}
```
#### Solution
```
override def close(success: Boolean): Unit = {
val start = System.currentTimeMillis()
val bufferManager = rssShuffleWriter.getBufferManager
val restBlocks = bufferManager.clear()
if (success && restBlocks != null && !restBlocks.isEmpty) {
// synchronized
rssShuffleWriter.synchronized {
rssShuffleWriterPushBlocksMethod.invoke(rssShuffleWriter, restBlocks)
}
}
val writeDurationMs = bufferManager.getWriteTime +
(System.currentTimeMillis() - start)
metrics.incWriteTime(TimeUnit.MILLISECONDS.toNanos(writeDurationMs))
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]