[GitHub] [spark-website] MaxGekk commented on a diff in pull request #396: Add 3.3.0 release note and news and update links

GitBox Thu, 16 Jun 2022 02:21:16 -0700


MaxGekk commented on code in PR #396:
URL: https://github.com/apache/spark-website/pull/396#discussion_r898881172



##########
releases/_posts/2022-06-16-spark-release-3-3-0.md:
##########
@@ -0,0 +1,477 @@
+---
+layout: post
+title: Spark Release 3.3.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.3.0 is the fourth release of the 3.x line. With tremendous 
contribution from the open-source community, this release managed to resolve in 
excess of 1,600 Jira tickets.
+
+This release improve join query performance via Bloom filters, increases the 
Pandas API coverage with the support of popular Pandas features such as 
datetime.timedelta and merge_asof, simplifies the migration from traditional 
data warehouses by improving ANSI compliance and supporting dozens of new 
built-in functions, boosts development productivity with better error handling, 
autocompletion, performance, and profiling. 
+
+To download Apache Spark 3.3.0, visit the 
[downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA 
for the [detailed changes](https://s.apache.org/spark-3.3.0). We have curated a 
list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+
+### Highlight
+
+
+
+* Row-level Runtime Filtering 
([SPARK-32268](https://issues.apache.org/jira/browse/SPARK-32268))  
+* ANSI enhancements 
([SPARK-38860](https://issues.apache.org/jira/browse/SPARK-38860))  
+* Error Message Improvements 
([SPARK-38781](https://issues.apache.org/jira/browse/SPARK-38781))
+* Support complex types for Parquet vectorized reader 
([SPARK-34863](https://issues.apache.org/jira/browse/SPARK-34863))
+* Hidden File Metadata Support for Spark SQL 
([SPARK-37273](https://issues.apache.org/jira/browse/SPARK-37273))
+* Provide a profiler for Python/Pandas UDFs 
([SPARK-37443](https://issues.apache.org/jira/browse/SPARK-37443))
+* Introduce Trigger.AvailableNow for running streaming queries like 
Trigger.Once in multiple batches 
([SPARK-36533](https://issues.apache.org/jira/browse/SPARK-36533))
+* More comprehensive DS V2 push down capabilities 
([SPARK-38788](https://issues.apache.org/jira/browse/SPARK-38788))
+* Executor Rolling in Kubernetes environment 
([SPARK-37810](https://issues.apache.org/jira/browse/SPARK-37810))
+* Support Customized Kubernetes Schedulers ( 
[SPARK-36057](https://issues.apache.org/jira/browse/SPARK-36057))
+* Migrating from log4j 1 to log4j 2 
([SPARK-37814](https://issues.apache.org/jira/browse/SPARK-37814))
+
+
+### Spark SQL and Core
+
+
+#### ANSI mode
+
+
+
+* New explicit cast syntax rules in ANSI mode 
([SPARK-33354](https://issues.apache.org/jira/browse/SPARK-33354))
+* Elt() should return null if index is null under ANSI mode 
([SPARK-38304](https://issues.apache.org/jira/browse/SPARK-38304))
+* Optionally return null result if element not exists in array/map 
([SPARK-37750](https://issues.apache.org/jira/browse/SPARK-37750))
+* Allow casting between numeric type and timestamp type 
([SPARK-37714](https://issues.apache.org/jira/browse/SPARK-37714))
+* Disable ANSI reserved keywords by default 
([SPARK-37724](https://issues.apache.org/jira/browse/SPARK-37724))
+* Use store assignment rules for resolving function invocation 
([SPARK-37438](https://issues.apache.org/jira/browse/SPARK-37438))
+* Add a config to allow casting between Datetime and Numeric 
([SPARK-37179](https://issues.apache.org/jira/browse/SPARK-37179))
+* Add a config to optionally enforce ANSI reserved keywords 
([SPARK-37133](https://issues.apache.org/jira/browse/SPARK-37133))
+* Disallow binary operations between Interval and String literal 
([SPARK-36508](https://issues.apache.org/jira/browse/SPARK-36508))
+
+
+#### Feature Enhancements
+
+
+
+* Support ANSI SQL INTERVAL types 
([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Error Message Improvements 
([SPARK-38781](https://issues.apache.org/jira/browse/SPARK-38781))
+* Hidden File Metadata Support for Spark SQL 
([SPARK-37273](https://issues.apache.org/jira/browse/SPARK-37273))
+* Support raw string literal 
([SPARK-36371](https://issues.apache.org/jira/browse/SPARK-36371))
+* Helper class for batch Dataset.observe() 
([SPARK-34806](https://issues.apache.org/jira/browse/SPARK-34806))
+* Support specify initial partition number for rebalance 
([SPARK-38410](https://issues.apache.org/jira/browse/SPARK-38410))
+* Support cascade mode for `dropNamespace` API 
([SPARK-37929](https://issues.apache.org/jira/browse/SPARK-37929))
+* Allow store assignment and implicit cast among datetime types 
([SPARK-37707](https://issues.apache.org/jira/browse/SPARK-37707))
+* Collect, first and last should be deterministic aggregate functions 
([SPARK-32940](https://issues.apache.org/jira/browse/SPARK-32940))
+* Add ExpressionBuilder for functions with complex overloads 
([SPARK-37164](https://issues.apache.org/jira/browse/SPARK-37164))
+* Add array support to union by name 
([SPARK-36546](https://issues.apache.org/jira/browse/SPARK-36546))
+* Add df.withMetadata: a syntax sugar to update the metadata of a dataframe 
([SPARK-36642](https://issues.apache.org/jira/browse/SPARK-36642))
+* Support raw string literal 
([SPARK-36371](https://issues.apache.org/jira/browse/SPARK-36371))
+* Use CAST in parsing of dates/timestamps with default pattern 
([SPARK-36418](https://issues.apache.org/jira/browse/SPARK-36418))
+* Support value class in nested schema for Dataset 
([SPARK-20384](https://issues.apache.org/jira/browse/SPARK-20384))
+* Add AS OF syntax support 
([SPARK-37219](https://issues.apache.org/jira/browse/SPARK-37219))
+* Add REPEATABLE in TABLESAMPLE to specify seed 
([SPARK-37165](https://issues.apache.org/jira/browse/SPARK-37165))
+* Add ansi syntax `set catalog xxx` to change the current catalog 
([SPARK-36841](https://issues.apache.org/jira/browse/SPARK-36841))
+* Support ILIKE (ALL | ANY | SOME) - case insensitive LIKE 
([SPARK-36674](https://issues.apache.org/jira/browse/SPARK-36674), 
[SPARK-36736](https://issues.apache.org/jira/browse/SPARK-36736), 
[SPARK-36778](https://issues.apache.org/jira/browse/SPARK-36778))
+* Support query stage show runtime statistics in formatted explain mode 
([SPARK-38322](https://issues.apache.org/jira/browse/SPARK-38322))
+* Add spill size metrics for sort merge join 
([SPARK-37726](https://issues.apache.org/jira/browse/SPARK-37726))
+* Update the SQL syntax of SHOW FUNCTIONS 
([SPARK-37777](https://issues.apache.org/jira/browse/SPARK-37777))
+* Implement support for DEFAULT values for columns in tables 
([SPARK-38334](https://issues.apache.org/jira/browse/SPARK-38334))
+* Storage Partitioned Join 
([SPARK-37375](https://issues.apache.org/jira/browse/SPARK-37375))
+* Support DROP COLUMN [IF EXISTS] syntax 
([SPARK-38939](https://issues.apache.org/jira/browse/SPARK-38939))
+* New built-in functions and their extensions 
([SPARK-38783](https://issues.apache.org/jira/browse/SPARK-38783))
+    * Datetime
+        * Add the <span 
style="text-decoration:underline;">TIMESTAMPADD</span>() function 
([SPARK-38195](https://issues.apache.org/jira/browse/SPARK-38195))
+        * Add the <span 
style="text-decoration:underline;">TIMESTAMPDIFF</span>() function 
([SPARK-38284](https://issues.apache.org/jira/browse/SPARK-38284))
+        * Add the `<span style="text-decoration:underline;">DATEDIFF</span>()` 
alias for `TIMESTAMPDIFF()` 
([SPARK-38389](https://issues.apache.org/jira/browse/SPARK-38389))
+        * Add the `<span style="text-decoration:underline;">DATEADD</span>()` 
alias for `TIMESTAMPADD()` 
([SPARK-38332](https://issues.apache.org/jira/browse/SPARK-38332))
+        * Add the `<span 
style="text-decoration:underline;">convert_timezone</span>()` function 
([SPARK-37552](https://issues.apache.org/jira/browse/SPARK-37552), 
[SPARK-37568](https://issues.apache.org/jira/browse/SPARK-37568))
+        * Expose <span style="text-decoration:underline;">make_date</span> 
expression in functions.scala 
([SPARK-36554](https://issues.apache.org/jira/browse/SPARK-36554))
+    * AES functions 
([SPARK-12567](https://issues.apache.org/jira/browse/SPARK-12567))
+        * Add <span style="text-decoration:underline;">aes_encrypt</span> and 
<span style="text-decoration:underline;">aes_decrypt</span> builtin functions 
([SPARK-12567](https://issues.apache.org/jira/browse/SPARK-12567)) \
+Support the GCM mode by `<span 
style="text-decoration:underline;">aes_encrypt</span>()`/`<span 
style="text-decoration:underline;">aes_decrypt</span>()` 
([SPARK-37591](https://issues.apache.org/jira/browse/SPARK-37591))
+        * Set `GCM` as the default mode in `<span 
style="text-decoration:underline;">aes_encrypt</span>()`/`<span 
style="text-decoration:underline;">aes_decrypt</span>()` 
([SPARK-37666](https://issues.apache.org/jira/browse/SPARK-37666))
+        * Add the `mode` and `padding` args to `<span 
style="text-decoration:underline;">aes_encrypt</span>()`/`<span 
style="text-decoration:underline;">aes_decrypt</span>()` 
([SPARK-37586](https://issues.apache.org/jira/browse/SPARK-37586))
+    * ANSI Aggregation Function 
([SPARK-37671](https://issues.apache.org/jira/browse/SPARK-37671))
+        * Support ANSI Aggregate Function: <span 
style="text-decoration:underline;">regr_count</span> 
([SPARK-37613](https://issues.apache.org/jira/browse/SPARK-37613))
+        * Support ANSI Aggregate Function: <span 
style="text-decoration:underline;">regr_avgx</span> & <span 
style="text-decoration:underline;">regr_avgy</span> 
([SPARK-37614](https://issues.apache.org/jira/browse/SPARK-37614))
+        * Support ANSI Aggregate Function: <span 
style="text-decoration:underline;">regr_count</span> 
([SPARK-37613](https://issues.apache.org/jira/browse/SPARK-37613))
+        * Support ANSI Aggregate Function: regr_r2 
([SPARK-37641](https://issues.apache.org/jira/browse/SPARK-37641))
+        * Support ANSI Aggregate Function: <span 
style="text-decoration:underline;">array_agg</span> 
([SPARK-27974](https://issues.apache.org/jira/browse/SPARK-27974))
+        * Support ANSI Aggregation Function: <span 
style="text-decoration:underline;">percentile_cont</span> 
([SPARK-37676](https://issues.apache.org/jira/browse/SPARK-37676), 
[SPARK-38219](https://issues.apache.org/jira/browse/SPARK-38219))
+        * Support ANSI Aggregation Function: <span 
style="text-decoration:underline;">percentile_disc</span> 
([SPARK-37691](https://issues.apache.org/jira/browse/SPARK-37691))
+        * New SQL function: try_avg 
([SPARK-38589](https://issues.apache.org/jira/browse/SPARK-38589))
+    * Collections
+        * Introduce SQL function <span 
style="text-decoration:underline;">ARRAY_SIZE</span> 
([SPARK-38345](https://issues.apache.org/jira/browse/SPARK-38345))
+        * New SQL function: <span 
style="text-decoration:underline;">map_contains_key</span> 
([SPARK-37584](https://issues.apache.org/jira/browse/SPARK-37584))
+        * New SQL function: <span 
style="text-decoration:underline;">try_element_at</span> 
([SPARK-37533](https://issues.apache.org/jira/browse/SPARK-37533))
+        * New SQL function: <span 
style="text-decoration:underline;">try_sum</span> 
([SPARK-38548](https://issues.apache.org/jira/browse/SPARK-38548))
+    * Format
+        * Add a new SQL function <span 
style="text-decoration:underline;">to_binary</span> 
([SPARK-37507](https://issues.apache.org/jira/browse/SPARK-37507), 
[SPARK-38796](https://issues.apache.org/jira/browse/SPARK-38796))
+        * New SQL function: <span 
style="text-decoration:underline;">try_to_binary</span> 
([SPARK-38590](https://issues.apache.org/jira/browse/SPARK-38590), 
[SPARK-38796](https://issues.apache.org/jira/browse/SPARK-38796))
+        * Data Type Formatting Functions: `<span 
style="text-decoration:underline;">to_number</span>` 
([SPARK-28137](https://issues.apache.org/jira/browse/SPARK-28137))
+    * String/Binary
+        * Add <span style="text-decoration:underline;">CONTAINS</span>() 
string function 
([SPARK-37508](https://issues.apache.org/jira/browse/SPARK-37508))
+        * Add the `<span 
style="text-decoration:underline;">startswith</span>()` and `<span 
style="text-decoration:underline;">endswith</span>()` string functions 
([SPARK-37520](https://issues.apache.org/jira/browse/SPARK-37520))
+        * Add lpad and rpad functions for binary strings 
([SPARK-37047](https://issues.apache.org/jira/browse/SPARK-37047))
+        * Support split_part Function 
([SPARK-38063](https://issues.apache.org/jira/browse/SPARK-38063))
+    * Add scale parameter to <span 
style="text-decoration:underline;">floor</span> and <span 
style="text-decoration:underline;">ceil</span> functions 
([SPARK-37475](https://issues.apache.org/jira/browse/SPARK-37475))
+    * New SQL functions: <span 
style="text-decoration:underline;">try_subtract</span> and <span 
style="text-decoration:underline;">try_multiply</span> 
([SPARK-38164](https://issues.apache.org/jira/browse/SPARK-38164))
+    * Implements <span 
style="text-decoration:underline;">histogram_numeric</span> aggregation 
function which supports partial aggregation 
([SPARK-16280](https://issues.apache.org/jira/browse/SPARK-16280))
+    * Add max_by/min_by to sql.functions 
([SPARK-36963](https://issues.apache.org/jira/browse/SPARK-36963))
+    * Add new built-in SQL functions: SEC and CSC 
([SPARK-36683](https://issues.apache.org/jira/browse/SPARK-36683))
+    * array_intersect handles duplicated Double.NaN and Float.NaN 
([SPARK-36754](https://issues.apache.org/jira/browse/SPARK-36754))
+    * Add cot as Scala and Python functions 
([SPARK-36660](https://issues.apache.org/jira/browse/SPARK-36660))
+
+
+#### Performance enhancements
+
+
+
+* Whole-stage code generation
+    * Add code-gen for sort aggregate without grouping keys 
([SPARK-37564](https://issues.apache.org/jira/browse/SPARK-37564))
+    * Add code-gen for full outer sort merge join 
([SPARK-35352](https://issues.apache.org/jira/browse/SPARK-35352))
+    * Add code-gen for full outer shuffled hash join 
([SPARK-32567](https://issues.apache.org/jira/browse/SPARK-32567))
+    * Add code-gen for existence sort merge join 
([SPARK-37316](https://issues.apache.org/jira/browse/SPARK-37316))
+* Push down (filters)
+    * Push down filters through RebalancePartitions 
([SPARK-37828](https://issues.apache.org/jira/browse/SPARK-37828))
+    * Push down boolean column filter 
([SPARK-36644](https://issues.apache.org/jira/browse/SPARK-36644))
+    * Push down limit 1 for right side of left semi/anti join if join 
condition is empty 
([SPARK-37917](https://issues.apache.org/jira/browse/SPARK-37917))
+    * Translate more standard aggregate functions for pushdown 
([SPARK-37527](https://issues.apache.org/jira/browse/SPARK-37527))
+    * Support propagate empty relation through aggregate/union 
([SPARK-35442](https://issues.apache.org/jira/browse/SPARK-35442))
+    * Row-level Runtime Filtering 
([SPARK-32268](https://issues.apache.org/jira/browse/SPARK-32268))
+    * Support Left Semi join in row level runtime filters 
([SPARK-38565](https://issues.apache.org/jira/browse/SPARK-38565))
+    * Support predicate pushdown and column pruning for de-duped CTEs 
([SPARK-37670](https://issues.apache.org/jira/browse/SPARK-37670))
+* Vectorization
+    * Implement a ConstantColumnVector and improve performance of the hidden 
file metadata ([SPARK-37896](https://issues.apache.org/jira/browse/SPARK-37896))
+    * Enable vectorized read for VectorizedPlainValuesReader.readBooleans 
([SPARK-35867](https://issues.apache.org/jira/browse/SPARK-35867))
+* Combine/remove/replace nodes
+    * Combine unions if there is a project between them 
([SPARK-37915](https://issues.apache.org/jira/browse/SPARK-37915))
+    * Combine to one cast if we can safely up-cast two casts 
([SPARK-37922](https://issues.apache.org/jira/browse/SPARK-37922))
+    * Remove the Sort if it is the child of RepartitionByExpression 
([SPARK-36703](https://issues.apache.org/jira/browse/SPARK-36703))
+    * Removes outer join if it only has DISTINCT on streamed side with alias 
([SPARK-37292](https://issues.apache.org/jira/browse/SPARK-37292))
+    * Replace hash with sort aggregate if child is already sorted 
([SPARK-37455](https://issues.apache.org/jira/browse/SPARK-37455))
+    * Replace object hash with sort aggregate if child is already sorted 
([SPARK-37557](https://issues.apache.org/jira/browse/SPARK-37557))
+    * Only collapse projects if we don't duplicate expensive expressions 
([SPARK-36718](https://issues.apache.org/jira/browse/SPARK-36718))
+    * Remove redundant aliases after RewritePredicateSubquery 
([SPARK-36280](https://issues.apache.org/jira/browse/SPARK-36280))
+    * Merge non-correlated scalar subqueries 
([SPARK-34079](https://issues.apache.org/jira/browse/SPARK-34079))
+* Partitioning
+    * Do not add dynamic partition pruning if there exists static partition 
pruning ([SPARK-38148](https://issues.apache.org/jira/browse/SPARK-38148))
+    * Improve RebalancePartitions in rules of Optimizer 
([SPARK-37904](https://issues.apache.org/jira/browse/SPARK-37904))
+    * Add small partition factor for rebalance partitions 
([SPARK-37357](https://issues.apache.org/jira/browse/SPARK-37357))
+* Join
+    * Fine tune logic to demote Broadcast hash join in DynamicJoinSelection 
([SPARK-37753](https://issues.apache.org/jira/browse/SPARK-37753))
+    * Ignore duplicated join keys when building relation for SEMI/ANTI 
shuffled hash join 
([SPARK-36794](https://issues.apache.org/jira/browse/SPARK-36794))
+    * Support optimize skewed join even if introduce extra shuffle 
([SPARK-33832](https://issues.apache.org/jira/browse/SPARK-33832))
+* AQE
+    * Support eliminate limits in AQE Optimizer 
([SPARK-36424](https://issues.apache.org/jira/browse/SPARK-36424))
+    * Optimize one row plan in normal and AQE Optimizer 
([SPARK-38162](https://issues.apache.org/jira/browse/SPARK-38162))
+* Aggregate.groupOnly support foldable expressions 
([SPARK-38489](https://issues.apache.org/jira/browse/SPARK-38489))
+* ByteArrayMethods arrayEquals should fast skip the check of aligning with 
unaligned platform 
([SPARK-37796](https://issues.apache.org/jira/browse/SPARK-37796))
+* Add tree pattern pruning to CTESubstitution rule 
([SPARK-37379](https://issues.apache.org/jira/browse/SPARK-37379))
+* Add more Not operator simplifications 
([SPARK-36665](https://issues.apache.org/jira/browse/SPARK-36665))
+* Support BooleanType in UnwrapCastInBinaryComparison 
([SPARK-36607](https://issues.apache.org/jira/browse/SPARK-36607))
+* Coalesce drop all expressions after the first non nullable expression 
([SPARK-36359](https://issues.apache.org/jira/browse/SPARK-36359))
+* Add a logical plan visitor to propagate the distinct attributes 
([SPARK-36194](https://issues.apache.org/jira/browse/SPARK-36194))
+
+
+#### Built-in Connector Enhancements
+
+
+
+* General
+    * Lenient serialization of datetime from datasource 
([SPARK-38437](https://issues.apache.org/jira/browse/SPARK-38437))
+    * Treat table location as absolute when the first letter of its path is 
slash in create/alter table 
([SPARK-38236](https://issues.apache.org/jira/browse/SPARK-38236))
+    * Remove leading zeros from empty static number type partition 
([SPARK-35561](https://issues.apache.org/jira/browse/SPARK-35561))
+    * Support `ignoreCorruptFiles` and `ignoreMissingFiles` in Data Source 
options ([SPARK-38767](https://issues.apache.org/jira/browse/SPARK-38767))
+* Parquet
+    * Enable matching schema column names by field ids 
([SPARK-38094](https://issues.apache.org/jira/browse/SPARK-38094))
+    * Remove check field name when reading/writing data in parquet 
([SPARK-27442](https://issues.apache.org/jira/browse/SPARK-27442))
+    * Support vectorized read boolean values use RLE encoding with Parquet 
DataPage V2 ([SPARK-37864](https://issues.apache.org/jira/browse/SPARK-37864))
+    * Support Parquet V2 data page encoding (DELTA_BINARY_PACKED) for the 
vectorized path 
([SPARK-36879](https://issues.apache.org/jira/browse/SPARK-36879))
+    * Rebase timestamps in the session time zone saved in Parquet/Avro 
metadata ([SPARK-37705](https://issues.apache.org/jira/browse/SPARK-37705))
+    * Push down group by partition column for aggregate 
([SPARK-36646](https://issues.apache.org/jira/browse/SPARK-36646))
+    * Aggregate (Min/Max/Count) push down for Parquet 
([SPARK-36645](https://issues.apache.org/jira/browse/SPARK-36645))
+    * Parquet: enable matching schema columns by field id 
([SPARK-38094](https://issues.apache.org/jira/browse/SPARK-38094))

Review Comment:
   Removed the duplicate



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] MaxGekk commented on a diff in pull request #396: Add 3.3.0 release note and news and update links

Reply via email to