Re: Introduce Uniffle : A stability solution of Hive's shuffle

Sungwoo Park Fri, 29 Sep 2023 07:29:02 -0700

In addition to the two main benefits summarized by Rory, I would like to
add another benefit of using remote shuffle service:

3. If you run large jobs in public clouds, sometimes the amount of local
storage attached to your instances can be a limiting factor. By using
remote shuffle service, you can cut the usage of local storage by half
(because shuffle data is sent to remote shuffle service, rather than
written to local storage).

Although you still need local storage for the remaining half, using remote
shuffle service opens new possibilities of further reducing local storage
(e.g., directly reading from network rather than spilling to local disk).

Thanks,

--- Sungwoo

On Tue, Jul 11, 2023 at 9:48 PM roryqi <[email protected]> wrote:

> Dear Apache Hive community,
>
>
> We are delighted to announce the support of Tez on Uniffle.  Uniffle havs
> supported Apache Spark, Apache,Hadoop MapReduce and Apache Tez.
>
> Uniffle is a remote shuffle service. In several situations, Uniffle will
> provide great help.
>
>    1. If you use AWS spot instances or mix resources, tasks may be
>    preempted. It will be great if we store shuffle data in the Uniffle and
> we
>    can deploy Uniffle on some stable resource. It will improve the
> stability
>    of tasks. If tasks are preempted, we won’t recompute tasks if we store
>    shuffle in the Uniffle.
>    2. For large shuffle jobs, Uniffle can reduce random IO for the jobs.
>    Uniffle can improve the performance of jobs. For 1TB MapReduce
> Terasort, 1w
>    map tasks, 1w reduce tasks, job performance will increase 30%.
>
> We also welcome pull requests and are eager to see how you might use
> Uniffle to make Hive more user-friendly. More information, you can access
> https://github.com/apache/incubator-uniffle
>
>
> Best
>
> Rory
>

Re: Introduce Uniffle : A stability solution of Hive's shuffle

Reply via email to