[jira] [Commented] (SPARK-37088) Python UDF after off-heap vectorized reader can cause crash due to use-after-free in writer thread

Ankur Dave (Jira) Thu, 21 Oct 2021 08:08:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-37088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432540#comment-17432540
 ]


Ankur Dave commented on SPARK-37088:
------------------------------------

https://github.com/apache/spark/pull/34245

> Python UDF after off-heap vectorized reader can cause crash due to 
> use-after-free in writer thread
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37088
>                 URL: https://issues.apache.org/jira/browse/SPARK-37088
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, Spark Core, SQL
>    Affects Versions: 3.0.3, 3.1.2, 3.2.0
>            Reporter: Ankur Dave
>            Assignee: Ankur Dave
>            Priority: Major
>
> Python UDFs in Spark SQL are run in a separate Python process. The Python 
> process is fed input by a dedicated thread (`BasePythonRunner.WriterThread`). 
> This writer thread drives the child plan by pulling rows from its output 
> iterator and serializing them across a socket.
> When the child exec node is the off-heap vectorized Parquet reader, these 
> rows are backed by off-heap memory. The child node uses a task completion 
> listener to free the off-heap memory at the end of the task, which 
> invalidates the output iterator and any rows it has produced. Since task 
> completion listeners are registered bottom-up and executed in reverse order 
> of registration, this is safe as long as an exec node never accesses its 
> input after its task completion listener has executed.
> The BasePythonRunner task completion listener violates this assumption. It 
> interrupts the writer thread, but does not wait for it to exit. This causes a 
> race condition that can lead to an executor crash:
> 1. The Python writer thread is processing a row backed by off-heap memory.
> 2. The task finishes, for example because it has reached a row limit.
> 3. The BasePythonRunner task completion listener sets the interrupt status of 
> the writer thread, but the writer thread does not check it immediately.
> 4. The child plan's task completion listener frees its off-heap memory, 
> invalidating the row that the Python writer thread is processing.
> 5. The Python writer thread attempts to access the invalidated row. The 
> use-after-free triggers a segfault that crashes the executor.
> https://issues.apache.org/jira/browse/SPARK-33277 describes the same issue, 
> but the fix was incomplete. It did not address the situation where the Python 
> writer thread accesses a freed row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37088) Python UDF after off-heap vectorized reader can cause crash due to use-after-free in writer thread

Reply via email to