Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

Sungwoo Park Thu, 24 Apr 2025 02:07:43 -0700

I tried 10TB TPC-DS benchmark with Iceberg, but from preliminary results,
the execution time increases about 20% (total execution time from 4900s to
5800s, geo-mean from 19s to 21s). However, please note that the result is
not conclusive because 1) I used the build from last November, instead of
the latest build, 2) I had some problem with executing Hive-Tez with
Iceberg, so I used MR3 in the experiment.


When we discuss the release of the next version of Hive, let me repeat the
experiment by loading a fresh 10TB dataset.

--- Sungwoo


On Sat, Apr 12, 2025 at 6:28 PM Denys Kuzmenko <[email protected]> wrote:

> Thanks Sungwoo,
>
> Regarding performance testing, am I correct to assume that the "original"
> Hive table is an external one?
>
> Since Iceberg supports deletes, it might be worth comparing it against
> Hive ACID. We could generate 10-20% of the updates and measure the read
> performance overhead.
>
> Additionally, there's a 1 Trillion Row Challenge [1], [2] that we could
> try, extending it with the delete operations (see Impala talk on Iceberg
> Summit 2025).
>
> In any case, it would be helpful to create a roadmap or a Jira EPIC for
> the Default Table Format migration and populate it with the key tasks we
> think are essential before making the switch.
>
> 1. https://www.morling.dev/blog/one-billion-row-challenge/
> 2.
> https://medium.com/dbsql-sme-engineering/1-trillion-row-challenge-on-databricks-sql-41a82fac5bed
>
> Regards,
> Denys
>

Re: [DISCUSS] Changing Default Table Format to Iceberg in Upcoming Releases

Reply via email to