date:20240220

[ANNOUNCE] Apache Kyuubi 1.8.1 is available

2024-02-20 Thread Cheng Pan

Hi all,

The Apache Kyuubi community is pleased to announce that
Apache Kyuubi 1.8.1 has been released!

Apache Kyuubi is a distributed and multi-tenant gateway to provide
serverless SQL on data warehouses and lakehouses.

Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC and
RESTful interfaces for end-users to manipulate large-scale data with
pre-programmed and extensible Spark/Flink/Trino/Hive engines.

We are aiming to make Kyuubi an "out-of-the-box" tool for data warehouses
and lakehouses.

This "out-of-the-box" model minimizes the barriers and costs for end-users
to use Spark/Flink/Trino/Hive engines on the client side.

At the server-side, Kyuubi server and engine's multi-tenant architecture
provides the administrators a way to achieve computing resource isolation,
data security, high availability, high client concurrency, etc.

The full release notes and download links are available at:
Release Notes: https://kyuubi.apache.org/release/1.8.1.html

To learn more about Apache Kyuubi, please see
https://kyuubi.apache.org/

Kyuubi Resources:
- Issue: https://github.com/apache/kyuubi/issues
- Mailing list: d...@kyuubi.apache.org

We would like to thank all contributors of the Kyuubi community
who made this release possible!

Thanks,
Cheng Pan, on behalf of Apache Kyuubi community

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark 3.3 Query Analyzer Bug Report

2024-02-20 Thread Sharma, Anup

Apologies. Issue is seen after we upgraded from Spark 3.1 to Spark 3.3.  The 
same query runs fine on Spark 3.1.

Omit the Spark version mentioned in email subject earlier.

Anup

Error trace:
query_result.explain(extended=True)\n  File 
\"…/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py\"

raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while 
calling z:org.apache.spark.sql.api.python.PythonSQLUtils.explainString.\n: 
java.lang.IllegalStateException: You hit a query analyzer bug. Please report 
your query to Spark user mailing list.\n\tat 
org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:516)\n\tat
 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)\n\tat
 scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)\n\tat 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)\n\tat 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)\n\tat 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)\n\tat
 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:72)\n\tat
 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)\n\tat
 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)\n\tat
 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)\n\tat
 scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat 
scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat 
scala.collection.AbstractIterator.foreach(Iterator.scala:1431)\n\tat 
scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)\n\tat 
scala.collect...


From: "Sharma, Anup" 
Date: Tuesday, February 20, 2024 at 4:58 PM
To: "user@spark.apache.org" 
Cc: "Thinderu, Shalini" 
Subject: Spark 4.0 Query Analyzer Bug Report

Hi Spark team,

We ran into a dataframe issue after upgrading from spark 3.1 to 4.

query_result.explain(extended=True)\n  File 
\"…/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py\"

raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while 
calling z:org.apache.spark.sql.api.python.PythonSQLUtils.explainString.\n: 
java.lang.IllegalStateException: You hit a query analyzer bug. Please report 
your query to Spark user mailing list.\n\tat 
org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:516)\n\tat
 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)\n\tat
 scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)\n\tat 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)\n\tat 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)\n\tat 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)\n\tat
 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:72)\n\tat
 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)\n\tat
 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)\n\tat
 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)\n\tat
 scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat 
scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat 
scala.collection.AbstractIterator.foreach(Iterator.scala:1431)\n\tat 
scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)\n\tat 
scala.collect...


Could you please let us know if this is already being looked at?

Thanks,
Anup

Re: Spark 4.0 Query Analyzer Bug Report

2024-02-20 Thread Holden Karau

Do you mean Spark 3.4? 4.0 is very much not released yet.

Also it would help if you could share your query & more of the logs leading
up to the error.

On Tue, Feb 20, 2024 at 3:07 PM Sharma, Anup 
wrote:

> Hi Spark team,
>
>
>
> We ran into a dataframe issue after upgrading from spark 3.1 to 4.
>
>
>
> query_result.explain(extended=True)\n  File
> \"…/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py\"
>
> raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while 
> calling z:org.apache.spark.sql.api.python.PythonSQLUtils.explainString.\n: 
> java.lang.IllegalStateException: You hit a query analyzer bug. Please report 
> your query to Spark user mailing list.\n\tat 
> org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:516)\n\tat
>  
> org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)\n\tat
>  scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)\n\tat 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)\n\tat 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)\n\tat 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)\n\tat
>  
> org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:72)\n\tat
>  
> org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)\n\tat
>  
> scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)\n\tat
>  
> scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)\n\tat
>  scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat 
> scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1431)\n\tat 
> scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)\n\tat 
> scala.collect...
>
>
>
>
>
> Could you please let us know if this is already being looked at?
>
>
>
> Thanks,
>
> Anup
>


-- 
Cell : 425-233-8271

Spark 4.0 Query Analyzer Bug Report

2024-02-20 Thread Sharma, Anup

Hi Spark team,

We ran into a dataframe issue after upgrading from spark 3.1 to 4.

query_result.explain(extended=True)\n  File 
\"…/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py\"

raise Py4JJavaError(\npy4j.protocol.Py4JJavaError: An error occurred while 
calling z:org.apache.spark.sql.api.python.PythonSQLUtils.explainString.\n: 
java.lang.IllegalStateException: You hit a query analyzer bug. Please report 
your query to Spark user mailing list.\n\tat 
org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:516)\n\tat
 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)\n\tat
 scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)\n\tat 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)\n\tat 
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)\n\tat 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)\n\tat
 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:72)\n\tat
 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)\n\tat
 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196)\n\tat
 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194)\n\tat
 scala.collection.Iterator.foreach(Iterator.scala:943)\n\tat 
scala.collection.Iterator.foreach$(Iterator.scala:943)\n\tat 
scala.collection.AbstractIterator.foreach(Iterator.scala:1431)\n\tat 
scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199)\n\tat 
scala.collect...


Could you please let us know if this is already being looked at?

Thanks,
Anup

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-20 Thread Manoj Kumar

Dear @Chao Sun,

I trust you're doing well. Having worked extensively with Spark Nvidia
Rapids, Velox, and Gluten, I'm now contemplating Comet's potential
advantages over Velox in terms of performance and unique features.

While Rapids leverages GPUs effectively, Gazelle's Intel AVX512 intrinsics
which is now EOL. Now, all eyes are on Velox for its universal C++
accelerators(Presto, Spark, PyTorch, XStream (stream processing), F3
(feature engineering), FBETL (data ingestion), XSQL(distributed transaction
processing) , Scribe (message bus infrastructure), Saber (high QPS external
serving), and others...).

In this context, I'm keen to understand Comet's distinctive features and
how its performance compares to Velox. What makes Comet stand out, and how
does its efficiency stack up against Velox across different tasks and
frameworks?

Your insights into Comet's capabilities would be invaluable, it will help
me to evaluate why I should invest my time in this plugin.

Thank you for your time and expertise.

Warm regards,
Manoj Kumar

On Tue, 20 Feb 2024 at 01:51, Mich Talebzadeh 
wrote:

> Ok thanks for your clarifications
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Mon, 19 Feb 2024 at 17:24, Chao Sun  wrote:
>
>> Hi Mich,
>>
>> > Also have you got some benchmark results from your tests that you can
>> possibly share?
>>
>> We only have some partial benchmark results internally so far. Once
>> shuffle and better memory management have been introduced, we plan to
>> publish the benchmark results (at least TPC-H) in the repo.
>>
>> > Compared to standard Spark, what kind of performance gains can be
>> expected with Comet?
>>
>> Currently, users could benefit from Comet in a few areas:
>> - Parquet read: a few improvements have been made against reading from S3
>> in particular, so users can expect better scan performance in this scenario
>> - Hash aggregation
>> - Columnar shuffle
>> - Decimals (Java's BigDecimal is pretty slow)
>>
>> > Can one use Comet on k8s in conjunction with something like a Volcano
>> addon?
>>
>> I think so. Comet is mostly orthogonal to the Spark scheduler framework.
>>
>> Chao
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Chao,
>>>
>>> As a cool feature
>>>
>>>
>>>- Compared to standard Spark, what kind of performance gains can be
>>>expected with Comet?
>>>-  Can one use Comet on k8s in conjunction with something like a
>>>Volcano addon?
>>>
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge, sourced from both personal expertise and other resources but of
>>> course cannot be guaranteed . It is essential to note that, as with any
>>> advice, one verified and tested result holds more weight than a thousand
>>> expert opinions.
>>>
>>>
>>> On Tue, 13 Feb 2024 at 20:42, Chao Sun  wrote:
>>>
 Hi all,

 We are very happy to announce that Project Comet, a plugin to
 accelerate Spark query execution via leveraging DataFusion and Arrow,
 has now been open sourced under the Apache Arrow umbrella. Please
 check the project repo
 https://github.com/apache/arrow-datafusion-comet for more details if
 you are interested. We'd love to collaborate with people from the open
 source community who share similar goals.

 Thanks,
 Chao

 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: unsubscribe

2024-02-20 Thread kritika jain

Unsubscribe

On Tue, 20 Feb 2024, 3:18 pm Крюков Виталий Семенович,
 wrote:

>
> unsubscribe
>
>
>

unsubscribe

2024-02-20 Thread Крюков Виталий Семенович


unsubscribe

Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald

Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!

We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this year's applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, May 10th, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you to
apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Hangzhou, China in July, 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)

[ANNOUNCE] Apache Kyuubi 1.8.1 is available

Re: Spark 3.3 Query Analyzer Bug Report

Re: Spark 4.0 Query Analyzer Bug Report

Spark 4.0 Query Analyzer Bug Report

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Re: unsubscribe

unsubscribe

Community Over Code Asia 2024 Travel Assistance Applications now open!

8 matches

Site Navigation

Mail list logo

Footer information