sunxiaojun created SPARK-57714:
----------------------------------

             Summary:  Add loser tree based k-way merger for 
UnsafeExternalSorter spill merge
                 Key: SPARK-57714
                 URL: https://issues.apache.org/jira/browse/SPARK-57714
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 4.0.0
            Reporter: sunxiaojun


 Add a loser tree based k-way merger as an opt-in alternative to the existing
  priority-queue (binary heap) based UnsafeSorterSpillMerger used by
  UnsafeExternalSorter to merge spill files.
  
  When UnsafeExternalSorter produces many spill files, the merge phase is
  dominated by comparator cost. A loser tree performs one comparison and one
  loser-slot read per level on each replay, versus two comparisons plus two
  child reads for heap sift-down. For k-way merges, this roughly halves both
  the comparison count and the array touches per popped record.
  
  The new merger is gated behind a new internal config
  `spark.unsafe.sorter.spill.merger.useLoserTree`, defaulting to false so
  behavior is unchanged unless explicitly enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to