This is an automated email from the ASF dual-hosted git repository. alamb pushed a commit to branch site/tpch_data_generator in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
commit 3dbdc0a13ee482c90ad8246cf7a76d175fd7fc4f Author: Andrew Lamb <[email protected]> AuthorDate: Fri Apr 4 15:57:34 2025 -0400 images + updates --- content/blog/2025-04-10-fastest-tpch-generator.md | 29 ++++++++++++++++----- .../images/fastest-tpch-generator/lamb-theory.png | Bin 0 -> 300479 bytes .../fastest-tpch-generator/parquet-performance.png | Bin 0 -> 61946 bytes .../fastest-tpch-generator/tbl-performance.png | Bin 0 -> 49477 bytes 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/content/blog/2025-04-10-fastest-tpch-generator.md b/content/blog/2025-04-10-fastest-tpch-generator.md index ddb3f22..7244528 100644 --- a/content/blog/2025-04-10-fastest-tpch-generator.md +++ b/content/blog/2025-04-10-fastest-tpch-generator.md @@ -34,18 +34,33 @@ th, td { } </style> -We used Rust and open source development to build [tpchgen-rs](https://github.com/alamb/tpchgen-rs), a fully open TPCH data generator over 10x faster than any other such generator we know of. +We used Rust and open source development to build [tpchgen-rs], a fully open +TPCH data generator over 10x faster than any other implementation we know of. -Authors: -* [Andrew Lamb](https://www.linkedin.com/in/andrewalamb/) ([@alamb](https://github.com/alamb)) is a Staff Engineer at [InfluxData](https://www.influxdata.com/) and an [Apache DataFusion](https://datafusion.apache.org/) and Apache Arrow PMC member. -* Achraf B ([@clflushopt](https://github.com/clflushopt)) is a Software Engineer at [Optable](https://optable.co/) where he works on data infrastructure. -* [Sean Smith](https://www.linkedin.com/in/scsmithr/) ([@scsmithr](https://github.com/scsmithr)) is the founder of [GlareDB](https://glaredb.com/) focused on building a fast analytics database. + +About the Authors: +- [Andrew Lamb] ([@alamb]) is a Staff Engineer at [InfluxData]) and a PMC member of [Apache DataFusion] and [Apache Arrow]. +- Achraf B ([@clflushopt]) is a Software Engineer at [Optable] where he works on data infrastructure. +- [Sean Smith] ([@scsmithr]) is the founder of focused on building a fast analytics database. It is now possible to create the TPCH SF=100 dataset in 72.23 seconds (1.4 GB/s 😎) on a Macbook Air M3 with 16GB of memory, compared to the classic `dbgen` which takes 30 minutes[^1] (0.05GB/sec). On the same machine, it takes less than -2 minutes to create all 3.6 GB of SF=100 in [Apache -Parquet](https://parquet.apache.org/) format. +2 minutes to create all 3.6 GB of SF=100 in [Apache Parquet] format. + +[tpchgen-rs]: https://github.com/alamb/tpchgen-rs + +[Andrew Lamb]: https://www.linkedin.com/in/andrewalamb/ +[@alamb]: https://github.com/alamb +[InfluxData]: https://www.influxdata.com/ +[Apache DataFusion]: https://datafusion.apache.org/ +[Apache Arrow]: https://arrow.apache.org/ +[@clflushopt]: https://github.com/clflushopt +[Optable]: https://optable.co/ +[Sean Smith]: https://www.linkedin.com/in/scsmithr/ +[@scsmithr]: https://github.com/scsmithr +[GlareDB]: https://glaredb.com/ +[Apache Parquet]: https://parquet.apache.org/ Finally, it is convenient and efficient to run TPCH queries locally when testing analytical engines such as DataFusion. diff --git a/content/images/fastest-tpch-generator/lamb-theory.png b/content/images/fastest-tpch-generator/lamb-theory.png new file mode 100644 index 0000000..2551ffa Binary files /dev/null and b/content/images/fastest-tpch-generator/lamb-theory.png differ diff --git a/content/images/fastest-tpch-generator/parquet-performance.png b/content/images/fastest-tpch-generator/parquet-performance.png new file mode 100644 index 0000000..462c995 Binary files /dev/null and b/content/images/fastest-tpch-generator/parquet-performance.png differ diff --git a/content/images/fastest-tpch-generator/tbl-performance.png b/content/images/fastest-tpch-generator/tbl-performance.png new file mode 100644 index 0000000..2e64f11 Binary files /dev/null and b/content/images/fastest-tpch-generator/tbl-performance.png differ --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
