Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Mich Talebzadeh Mon, 19 Feb 2024 12:21:43 -0800

Ok thanks for your clarifications

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Mon, 19 Feb 2024 at 17:24, Chao Sun <[email protected]> wrote:

> Hi Mich,
>
> > Also have you got some benchmark results from your tests that you can
> possibly share?
>
> We only have some partial benchmark results internally so far. Once
> shuffle and better memory management have been introduced, we plan to
> publish the benchmark results (at least TPC-H) in the repo.
>
> > Compared to standard Spark, what kind of performance gains can be
> expected with Comet?
>
> Currently, users could benefit from Comet in a few areas:
> - Parquet read: a few improvements have been made against reading from S3
> in particular, so users can expect better scan performance in this scenario
> - Hash aggregation
> - Columnar shuffle
> - Decimals (Java's BigDecimal is pretty slow)
>
> > Can one use Comet on k8s in conjunction with something like a Volcano
> addon?
>
> I think so. Comet is mostly orthogonal to the Spark scheduler framework.
>
> Chao
>
>
>
>
>
>
> On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <[email protected]>
> wrote:
>
>> Hi Chao,
>>
>> As a cool feature
>>
>>
>>    - Compared to standard Spark, what kind of performance gains can be
>>    expected with Comet?
>>    -  Can one use Comet on k8s in conjunction with something like a
>>    Volcano addon?
>>
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge, sourced from both personal expertise and other resources but of
>> course cannot be guaranteed . It is essential to note that, as with any
>> advice, one verified and tested result holds more weight than a thousand
>> expert opinions.
>>
>>
>> On Tue, 13 Feb 2024 at 20:42, Chao Sun <[email protected]> wrote:
>>
>>> Hi all,
>>>
>>> We are very happy to announce that Project Comet, a plugin to
>>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> has now been open sourced under the Apache Arrow umbrella. Please
>>> check the project repo
>>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> you are interested. We'd love to collaborate with people from the open
>>> source community who share similar goals.
>>>
>>> Thanks,
>>> Chao
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Reply via email to