wangyum opened a new pull request, #3196:
URL: https://github.com/apache/parquet-java/pull/3196
### Rationale for this change
The reason changing the original implementation (using
`Channels.newChannel(out)`) to directly writing via OutputStream resolves the
deadlock issue is as follows:
In the original implementation, using `Channels.newChannel(out)` introduces
an internal lock (`WritableByteChannelImpl`) that interacts with the underlying
`OutputStream`. When Spark's task interruption mechanism (`Task reaper` thread)
attempts to interrupt or close the channel, it acquires locks in a different
order compared to the executor thread writing data. Specifically:
- The executor thread holds the `DFSOutputStream` lock and waits for the
internal lock of `WritableByteChannelImpl`.
- The `Task reaper` thread holds the internal lock of
`WritableByteChannelImpl` and waits for the `DFSOutputStream` lock (during
`hflush()`).
This conflicting lock acquisition order results in a deadlock.
By directly writing to the `OutputStream` without using
`Channels.newChannel`, the intermediate locking introduced by
`WritableByteChannelImpl` is eliminated. This removes the conflicting lock
order scenario, thus resolving the deadlock.
### What changes are included in this PR?
### Are these changes tested?
### Are there any user-facing changes?
<!-- Please uncomment the line below and replace ${GITHUB_ISSUE_ID} with the
actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]