[jira] [Comment Edited] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

Mateusz Pieniak (JIRA) Tue, 22 May 2018 07:00:28 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-24334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483991#comment-16483991
 ]


Mateusz Pieniak edited comment on SPARK-24334 at 5/22/18 1:59 PM:
------------------------------------------------------------------

I came across with this issue while running my custom apply function on larger 
dataset. It works on smaller dataset. I got the exception:
{code:java}
SparkException: Job aborted due to stage failure: Task 0 in stage 43.0 failed 4 
times, most recent failure: Lost task 0.3 in stage 43.0 (TID 3108, 
10.217.183.141, executor 3): 
org.apache.spark.util.TaskCompletionListenerException: Memory was leaked by 
query. Memory leaked: (482816) Allocator(stdout writer for 
/databricks/python/bin/python) 0/482816/482816/9223372036854775807 
(res/actual/peak/limit) at 
org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:153) at 
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:131) 
at org.apache.spark.scheduler.Task.run(Task.scala:127) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:350) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748){code}
 


was (Author: pi3ni0):
I came across with this issue while running my custom apply function on larger 
dataset. I got the exception:
{code:java}
SparkException: Job aborted due to stage failure: Task 0 in stage 43.0 failed 4 
times, most recent failure: Lost task 0.3 in stage 43.0 (TID 3108, 
10.217.183.141, executor 3): 
org.apache.spark.util.TaskCompletionListenerException: Memory was leaked by 
query. Memory leaked: (482816) Allocator(stdout writer for 
/databricks/python/bin/python) 0/482816/482816/9223372036854775807 
(res/actual/peak/limit) at 
org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:153) at 
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:131) 
at org.apache.spark.scheduler.Task.run(Task.scala:127) at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:350) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748){code}
 

> Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory 
> allocator
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-24334
>                 URL: https://issues.apache.org/jira/browse/SPARK-24334
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Li Jin
>            Priority: Major
>
> Currently, ArrowPythonRunner has two thread that frees the Arrow vector 
> schema root and allocator - The main writer thread and task completion 
> listener thread. 
> Having both thread doing the clean up leads to weird case (e.g., negative ref 
> cnt, NPE, and memory leak exception) when an exceptions are thrown from the 
> user function.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

Reply via email to