Repository: spark Updated Branches: refs/heads/branch-1.1 8f8e2a4ee -> 092121e47
[SPARK-3239] [PySpark] randomize the dirs for each process This can avoid the IO contention during spilling, when you have multiple disks. Author: Davies Liu <[email protected]> Closes #2152 from davies/randomize and squashes the following commits: a4863c4 [Davies Liu] randomize the dirs for each process Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/092121e4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/092121e4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/092121e4 Branch: refs/heads/branch-1.1 Commit: 092121e477bcd2e474440dbdfdfa69cbd15c4803 Parents: 8f8e2a4 Author: Davies Liu <[email protected]> Authored: Wed Aug 27 10:40:35 2014 -0700 Committer: Matei Zaharia <[email protected]> Committed: Wed Aug 27 10:40:35 2014 -0700 ---------------------------------------------------------------------- python/pyspark/shuffle.py | 4 ++++ 1 file changed, 4 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/092121e4/python/pyspark/shuffle.py ---------------------------------------------------------------------- diff --git a/python/pyspark/shuffle.py b/python/pyspark/shuffle.py index 1ebe7df..2750f11 100644 --- a/python/pyspark/shuffle.py +++ b/python/pyspark/shuffle.py @@ -21,6 +21,7 @@ import platform import shutil import warnings import gc +import random from pyspark.serializers import BatchedSerializer, PickleSerializer @@ -216,6 +217,9 @@ class ExternalMerger(Merger): """ Get all the directories """ path = os.environ.get("SPARK_LOCAL_DIRS", "/tmp") dirs = path.split(",") + if len(dirs) > 1: + rnd = random.Random(os.getpid() + id(dirs)) + random.shuffle(dirs, rnd.random) return [os.path.join(d, "python", str(os.getpid()), str(id(self))) for d in dirs] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
