Hi Sungwoo, Many thanks for sharing your findings; interesting observations.
If you can please also share the project versions that you used for running the experiments. Best, Stamatis On Tue, Nov 15, 2022 at 12:46 PM Sungwoo Park <c...@pl.postech.ac.kr> wrote: > Hello, > > I ran the TPC-DS benchmark using Metastore (in the traditional way) and > Iceberg, > and would like to share the result for those interested in Hive using > Iceberg. > The experiment used 1TB TPC-DS dataset stored as ORC. > > Here are a few findings. > > 1. Overall, Hive-Iceberg runs slightly faster than Hive-Metastore. > > 2. Some queries run much faster with Hive-Iceberg. Examples) > query 14-1) Hive-Metastore: 61 seconds, Hive-Iceberg: 28 seconds > query 78) Hive-Metastore: 141 seconds, Hive-Iceberg: 58 seconds > > 3. Some queries run much slower with Hive-Iceberg. Example) > query 22: Hive-Metastore: 32 seconds, Hive-Iceberg: 356 seconds > (The slow execution is due to InputInitializer generating only 4 tasks for > the > first Map vertex.) > > 4. Out of 99 queries, 98 queries return correct results, but query 64 > returns > wrong results (returning 0 rows) due to an exception: > > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > > hdfs://blue0:8020/tmp/hive/user/35d3bdd7-4fda-4f3d-818d-048ad6242072/hive_2022-11-14_15-26-21_045_8992557056967167667-1/-mr-10001/.hive-staging_hive_2022-11-14_15-26-21_045_8992557056967167667-1/-ext-10002 > > --- Sungwoo > > > >