Ok thanks for your clarifications Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom
view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Mon, 19 Feb 2024 at 17:24, Chao Sun <[email protected]> wrote: > Hi Mich, > > > Also have you got some benchmark results from your tests that you can > possibly share? > > We only have some partial benchmark results internally so far. Once > shuffle and better memory management have been introduced, we plan to > publish the benchmark results (at least TPC-H) in the repo. > > > Compared to standard Spark, what kind of performance gains can be > expected with Comet? > > Currently, users could benefit from Comet in a few areas: > - Parquet read: a few improvements have been made against reading from S3 > in particular, so users can expect better scan performance in this scenario > - Hash aggregation > - Columnar shuffle > - Decimals (Java's BigDecimal is pretty slow) > > > Can one use Comet on k8s in conjunction with something like a Volcano > addon? > > I think so. Comet is mostly orthogonal to the Spark scheduler framework. > > Chao > > > > > > > On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <[email protected]> > wrote: > >> Hi Chao, >> >> As a cool feature >> >> >> - Compared to standard Spark, what kind of performance gains can be >> expected with Comet? >> - Can one use Comet on k8s in conjunction with something like a >> Volcano addon? >> >> >> HTH >> >> Mich Talebzadeh, >> Dad | Technologist | Solutions Architect | Engineer >> London >> United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* The information provided is correct to the best of my >> knowledge, sourced from both personal expertise and other resources but of >> course cannot be guaranteed . It is essential to note that, as with any >> advice, one verified and tested result holds more weight than a thousand >> expert opinions. >> >> >> On Tue, 13 Feb 2024 at 20:42, Chao Sun <[email protected]> wrote: >> >>> Hi all, >>> >>> We are very happy to announce that Project Comet, a plugin to >>> accelerate Spark query execution via leveraging DataFusion and Arrow, >>> has now been open sourced under the Apache Arrow umbrella. Please >>> check the project repo >>> https://github.com/apache/arrow-datafusion-comet for more details if >>> you are interested. We'd love to collaborate with people from the open >>> source community who share similar goals. >>> >>> Thanks, >>> Chao >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: [email protected] >>> >>>
