[
https://issues.apache.org/jira/browse/SPARK-17602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162457#comment-16162457
]
holdenk commented on SPARK-17602:
---------------------------------
[~liujunf] how about you go ahead and make a pull request and put [WIP] in the
title so we can all take a look at it? I've got some more bandwidth available
to do reviews and if we need to we can discuss it some more @ Spark Summit.
> PySpark - Performance Optimization Large Size of Broadcast Variable
> -------------------------------------------------------------------
>
> Key: SPARK-17602
> URL: https://issues.apache.org/jira/browse/SPARK-17602
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 1.6.2, 2.0.0
> Environment: Linux
> Reporter: Xiao Ming Bao
> Attachments: PySpark – Performance Optimization for Large Size of
> Broadcast variable.pdf
>
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> Problem: currently at executor side, the broadcast variable is written to
> disk as file and each python work process reads the bd from local disk and
> de-serialize to python object before executing a task, when the size of
> broadcast variables is large, the read/de-serialization takes a lot of time.
> And when the python worker is NOT reused and the number of task is large,
> this performance would be very bad since python worker needs to
> read/de-serialize for each task.
> Brief of the solution:
> transfer the broadcast variable to daemon python process via file (or
> socket/mmap) and deserialize file to object in daemon python process, after
> worker python process forked by daemon python process, worker python process
> would automatically has the deserialzied object and use it directly because
> of the memory Copy-on-write tech of Linux.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]