Hi Sungwoo,

Many thanks for sharing your findings; interesting observations.

If you can please also share the project versions that you used for running
the experiments.

Best,
Stamatis

On Tue, Nov 15, 2022 at 12:46 PM Sungwoo Park <c...@pl.postech.ac.kr> wrote:

> Hello,
>
> I ran the TPC-DS benchmark using Metastore (in the traditional way) and
> Iceberg,
> and would like to share the result for those interested in Hive using
> Iceberg.
> The experiment used 1TB TPC-DS dataset stored as ORC.
>
> Here are a few findings.
>
> 1. Overall, Hive-Iceberg runs slightly faster than Hive-Metastore.
>
> 2. Some queries run much faster with Hive-Iceberg. Examples)
> query 14-1) Hive-Metastore: 61 seconds, Hive-Iceberg: 28 seconds
> query 78) Hive-Metastore: 141 seconds, Hive-Iceberg: 58 seconds
>
> 3. Some queries run much slower with Hive-Iceberg. Example)
> query 22: Hive-Metastore: 32 seconds, Hive-Iceberg: 356 seconds
> (The slow execution is due to InputInitializer generating only 4 tasks for
> the
> first Map vertex.)
>
> 4. Out of 99 queries, 98 queries return correct results, but query 64
> returns
> wrong results (returning 0 rows) due to an exception:
>
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
>
> hdfs://blue0:8020/tmp/hive/user/35d3bdd7-4fda-4f3d-818d-048ad6242072/hive_2022-11-14_15-26-21_045_8992557056967167667-1/-mr-10001/.hive-staging_hive_2022-11-14_15-26-21_045_8992557056967167667-1/-ext-10002
>
> --- Sungwoo
>
>
>
>

Reply via email to