This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new ad0dc2fb43 chore: Improve change log generator (#10841)
ad0dc2fb43 is described below
commit ad0dc2fb43fa6a7ea3116075511a5bc9c39851b6
Author: Andy Grove <[email protected]>
AuthorDate: Sun Jun 9 09:59:19 2024 -0600
chore: Improve change log generator (#10841)
* Improve change log generator
* prettier
* prettier
---
dev/changelog/39.0.0.md | 146 +++++++++++++++++++++++---------------
dev/release/README.md | 51 ++++++-------
dev/release/generate-changelog.py | 64 ++++++++++++++---
3 files changed, 164 insertions(+), 97 deletions(-)
diff --git a/dev/changelog/39.0.0.md b/dev/changelog/39.0.0.md
index f94e34592c..ff27b4ba24 100644
--- a/dev/changelog/39.0.0.md
+++ b/dev/changelog/39.0.0.md
@@ -1,23 +1,25 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
+ http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
-->
-## [39.0.0](https://github.com/apache/datafusion/tree/39.0.0) (2024-06-07)
+# Apache DataFusion 39.0.0 Changelog
+
+This release consists of 234 commits from 59 contributors. See credits at the
end of this changelog for more information.
**Breaking changes:**
@@ -72,16 +74,12 @@
- docs: add documents to substrait type variation consts
[#10719](https://github.com/apache/datafusion/pull/10719) (waynexia)
- Minor: (Doc) Enable rt-multi-thread feature for sample code
[#10770](https://github.com/apache/datafusion/pull/10770) (hsiang-c)
-**Merged pull requests:**
+**Other:**
-- Prepare 38.0.0 release candidate 1
[#10407](https://github.com/apache/datafusion/pull/10407) (andygrove)
- Minor: Add more docs and examples for `Expr::unalias`
[#10406](https://github.com/apache/datafusion/pull/10406) (alamb)
- minor: Remove [RUST][datafusion] from release vote email subject line
[#10411](https://github.com/apache/datafusion/pull/10411) (andygrove)
-- Remove ScalarFunctionDefinition
[#10325](https://github.com/apache/datafusion/pull/10325) (lewiszlw)
-- chore(docs): update subquery documentation with more information
[#10361](https://github.com/apache/datafusion/pull/10361) (sanderson)
- fix dml logical plan output schema
[#10394](https://github.com/apache/datafusion/pull/10394) (leoyvens)
- [MINOR]: Move transpose code to under common
[#10409](https://github.com/apache/datafusion/pull/10409) (mustafasrepo)
-- minor: Remove docs archive
[#10416](https://github.com/apache/datafusion/pull/10416) (andygrove)
- Fix incorrect Schema over aggregate function, Remove unnecessary
`exprlist_to_fields_aggregate`
[#10408](https://github.com/apache/datafusion/pull/10408) (jonahgao)
- Enable user defined display_name for ScalarUDF
[#10417](https://github.com/apache/datafusion/pull/10417) (yyy1000)
- Fix and improve `CommonSubexprEliminate` rule
[#10396](https://github.com/apache/datafusion/pull/10396) (peter-toth)
@@ -94,16 +92,10 @@
- Improve flight sql examples
[#10432](https://github.com/apache/datafusion/pull/10432) (lewiszlw)
- Move Covariance (Population) covar_pop to be a User Defined Aggregate
Function [#10418](https://github.com/apache/datafusion/pull/10418) (yyy1000)
- Stop copying LogicalPlan and Exprs in `OptimizeProjections` (2% faster
planning) [#10405](https://github.com/apache/datafusion/pull/10405) (alamb)
-- Minor: format comments in `PushDownFilter` rule
[#10437](https://github.com/apache/datafusion/pull/10437) (alamb)
- chore: Improve release process for next time
[#10447](https://github.com/apache/datafusion/pull/10447) (andygrove)
-- Minor: Add usecase to comments in `LogicalPlan::recompute_schema`
[#10443](https://github.com/apache/datafusion/pull/10443) (alamb)
-- doc: fix old master branch references to main
[#10458](https://github.com/apache/datafusion/pull/10458) (Jefffrey)
- Move bit_and_or_xor unit tests to slt
[#10457](https://github.com/apache/datafusion/pull/10457) (NoeB)
-- Introduce user-defined signature
[#10439](https://github.com/apache/datafusion/pull/10439) (jayzhan211)
-- Remove `AggregateFunctionDefinition::Name`
[#10441](https://github.com/apache/datafusion/pull/10441) (lewiszlw)
- Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning)
[#10430](https://github.com/apache/datafusion/pull/10430) (alamb)
- refactor: Reduce string allocations in Expr::display_name (use write instead
of format!) [#10454](https://github.com/apache/datafusion/pull/10454)
(erratic-pattern)
-- Make `CREATE EXTERNAL TABLE` format options consistent, remove special
syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION`
[#10404](https://github.com/apache/datafusion/pull/10404) (berkaysynnada)
- Add `simplify` method to aggregate function
[#10354](https://github.com/apache/datafusion/pull/10354) (milenkovicm)
- Add cast array test to sqllogictest
[#10474](https://github.com/apache/datafusion/pull/10474) (viirya)
- Add `Expr::try_as_col`, deprecate `Expr::try_into_col` (speed up optimizer)
[#10448](https://github.com/apache/datafusion/pull/10448) (alamb)
@@ -113,21 +105,13 @@
- Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate`
[#10460](https://github.com/apache/datafusion/pull/10460) (ClSlaid)
- Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` (4% faster
planning) [#10431](https://github.com/apache/datafusion/pull/10431) (alamb)
- Improved ergonomy for `CREATE EXTERNAL TABLE OPTIONS`: Don't require
quotations for simple namespaced keys like `foo.bar`
[#10483](https://github.com/apache/datafusion/pull/10483) (ozankabak)
-- feat: allow `array_slice` to take an optional stride parameter
[#10469](https://github.com/apache/datafusion/pull/10469) (jonahgao)
- Replace `GetFieldAccess` with indexing function in `SqlToRel `
[#10375](https://github.com/apache/datafusion/pull/10375) (jayzhan211)
-- fix: make `columnize_expr` resistant to display_name collisions
[#10459](https://github.com/apache/datafusion/pull/10459) (jonahgao)
- Fix values with different data types caused failure
[#10445](https://github.com/apache/datafusion/pull/10445) (b41sh)
-- fix: avoid compressed json files repartitioning
[#10470](https://github.com/apache/datafusion/pull/10470) (korowa)
-- Minor: Improved document string for `LogicalPlanBuilder`
[#10496](https://github.com/apache/datafusion/pull/10496) (AbrarNitk)
- Fix SortMergeJoin with join filter filtering all rows out
[#10495](https://github.com/apache/datafusion/pull/10495) (viirya)
- chore: use fullpath in macro to avoid declaring in other module
[#10503](https://github.com/apache/datafusion/pull/10503) (jayzhan211)
-- Minor: Extend more style of udaf `expr_fn`, Remove order args
for`covar_samp` and `covar_pop`
[#10492](https://github.com/apache/datafusion/pull/10492) (jayzhan211)
- Minor: remove unused source file `udf.rs`
[#10497](https://github.com/apache/datafusion/pull/10497) (jonahgao)
-- feat: optional args for regexp\_\* UDFs
[#10514](https://github.com/apache/datafusion/pull/10514) (Michael-J-Ward)
- Support UDAF to align Builtin aggregate function
[#10493](https://github.com/apache/datafusion/pull/10493) (jayzhan211)
-- Remove `file_type()` from `FileFormat`
[#10499](https://github.com/apache/datafusion/pull/10499) (Jefffrey)
- Minor: add a test for `current_time` (no args)
[#10509](https://github.com/apache/datafusion/pull/10509) (alamb)
-- fix: parsing timestamp with date format
[#10476](https://github.com/apache/datafusion/pull/10476) (shanretoo)
- [MINOR]: Move pipeline checker rule to the end
[#10502](https://github.com/apache/datafusion/pull/10502) (mustafasrepo)
- Minor: Extract parent/child limit calculation into a function, improve docs
[#10501](https://github.com/apache/datafusion/pull/10501) (alamb)
- Fix window expr deserialization
[#10506](https://github.com/apache/datafusion/pull/10506) (lewiszlw)
@@ -135,7 +119,6 @@
- Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning)
[#10356](https://github.com/apache/datafusion/pull/10356) (alamb)
- Implement unparse `IS_NULL` to String and enhance the tests
[#10529](https://github.com/apache/datafusion/pull/10529) (goldmedal)
- Fix panic in array_agg(distinct) query
[#10526](https://github.com/apache/datafusion/pull/10526) (jayzhan211)
-- UDAF: Extend more args to `state_fields` and `groups_accumulator_supported`
and introduce `ReversedUDAF`
[#10525](https://github.com/apache/datafusion/pull/10525) (jayzhan211)
- Move min_max unit tests to slt
[#10539](https://github.com/apache/datafusion/pull/10539) (xinlifoobar)
- Implement unparse `IsNotFalse` to String
[#10538](https://github.com/apache/datafusion/pull/10538) (goldmedal)
- Implement Unparse TryCast Expr --> String Support
[#10542](https://github.com/apache/datafusion/pull/10542) (xinlifoobar)
@@ -145,38 +128,30 @@
- Stop most copying LogicalPlan and Exprs in `ScalarSubqueryToJoin`
[#10489](https://github.com/apache/datafusion/pull/10489) (alamb)
- Example for simple Expr --> SQL conversion
[#10528](https://github.com/apache/datafusion/pull/10528) (edmondop)
- fix `null_count` on `compute_record_batch_statistics` to report null counts
across partitions [#10468](https://github.com/apache/datafusion/pull/10468)
(samuelcolvin)
-- fix: `array_slice` panics
[#10547](https://github.com/apache/datafusion/pull/10547) (jonahgao)
- Minor: Add `PullUpCorrelatedExpr::new` and improve documentation
[#10500](https://github.com/apache/datafusion/pull/10500) (alamb)
- Stop copying LogicalPlan and Exprs in `PushDownLimit`
[#10508](https://github.com/apache/datafusion/pull/10508) (alamb)
- Break up contributing guide into smaller pages
[#10533](https://github.com/apache/datafusion/pull/10533) (alamb)
- PhysicalExpr Orderings with Range Information
[#10504](https://github.com/apache/datafusion/pull/10504) (berkaysynnada)
- Implement unparse `ScalarVariable` to String
[#10541](https://github.com/apache/datafusion/pull/10541) (reswqa)
-- feat: Expose Parquet Schema Adapter
[#10515](https://github.com/apache/datafusion/pull/10515) (HawaiianSpork)
- Handle dictionary values in ScalarValue serde
[#10563](https://github.com/apache/datafusion/pull/10563) (thinkharderdev)
- Improve signature of `get_field` function
[#10569](https://github.com/apache/datafusion/pull/10569) (lewiszlw)
- Implement Unparse `GroupingSet` Expr --> String Support sql
[#10555](https://github.com/apache/datafusion/pull/10555) (xinlifoobar)
- Minor: Move proxy to datafusion common
[#10561](https://github.com/apache/datafusion/pull/10561) (jayzhan211)
- Update prost-build requirement from =0.12.4 to =0.12.6
[#10578](https://github.com/apache/datafusion/pull/10578) (dependabot[bot])
- Add examples of how to convert logical plan to/from sql strings
[#10558](https://github.com/apache/datafusion/pull/10558) (xinlifoobar)
-- feat: API for collecting statistics/index for metadata of a parquet file +
tests [#10537](https://github.com/apache/datafusion/pull/10537) (NGA-TRAN)
- Fix: Sort Merge Join LeftSemi issues when JoinFilter is set
[#10304](https://github.com/apache/datafusion/pull/10304) (comphead)
-- Remove `Expr::GetIndexedField`, replace `Expr::{field,index,range}` with
`FieldAccessor`, `IndexAccessor`, and `SliceAccessor`
[#10568](https://github.com/apache/datafusion/pull/10568) (jayzhan211)
- Minor: Fix `ArrayFunctionRewriter` name reporting
[#10581](https://github.com/apache/datafusion/pull/10581) (alamb)
- Improve `UserDefinedLogicalNode::from_template` API to return `Result`
[#10575](https://github.com/apache/datafusion/pull/10575) (lewiszlw)
- Migrate testing optimizer rules to use `rewrite` API
[#10576](https://github.com/apache/datafusion/pull/10576) (lewiszlw)
-- Improve ContextProvider
[#10577](https://github.com/apache/datafusion/pull/10577) (lewiszlw)
- test: add more tests for statistics reading
[#10592](https://github.com/apache/datafusion/pull/10592) (NGA-TRAN)
- refactor: reduce allocations in push down filter
[#10567](https://github.com/apache/datafusion/pull/10567) (erratic-pattern)
- Fix compilation of datafusion-cli on 32bit targets
[#10594](https://github.com/apache/datafusion/pull/10594) (nathaniel-daniel)
-- Add to_date function to scalar functions doc
[#10601](https://github.com/apache/datafusion/pull/10601) (Omega359)
- Rename monotonicity as output_ordering in ScalarUDF's
[#10596](https://github.com/apache/datafusion/pull/10596) (berkaysynnada)
- Implement Unparser for `UNION ALL`
[#10603](https://github.com/apache/datafusion/pull/10603) (phillipleblanc)
- Improve `UserDefinedLogicalNodeCore::from_template` API to return Result
[#10597](https://github.com/apache/datafusion/pull/10597) (lewiszlw)
- Minor: Move group accumulator for aggregate function to
physical-expr-common, and add ahash physical-expr-common
[#10574](https://github.com/apache/datafusion/pull/10574) (jayzhan211)
- Minor: Consolidate some integration tests into `core_integration`
[#10588](https://github.com/apache/datafusion/pull/10588) (alamb)
- Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy`
[#10527](https://github.com/apache/datafusion/pull/10527) (appletreeisyellow)
-- feat: Add eliminate group by constant optimizer rule
[#10591](https://github.com/apache/datafusion/pull/10591) (korowa)
-- Docs: Update PR workflow documentation
[#10532](https://github.com/apache/datafusion/pull/10532) (alamb)
- [MINOR]: Update get range implementation for lead lag window functions
[#10614](https://github.com/apache/datafusion/pull/10614) (mustafasrepo)
- Minor: Improve documentation in sql_to_plan example
[#10582](https://github.com/apache/datafusion/pull/10582) (alamb)
- Docs: add examples for `RuntimeEnv::register_object_store`, improve error
messages [#10617](https://github.com/apache/datafusion/pull/10617) (aditanase)
@@ -184,7 +159,6 @@
- Add to_unixtime function to scalar functions doc
[#10620](https://github.com/apache/datafusion/pull/10620) (Omega359)
- Test for reading read statistics from parquet files without statistics and
boolean & struct data type
[#10608](https://github.com/apache/datafusion/pull/10608) (NGA-TRAN)
- adding benchmark for extracting arrow statistics from parquet
[#10610](https://github.com/apache/datafusion/pull/10610) (Lordworms)
-- feat: extend `unnest` to support Struct datatype
[#10429](https://github.com/apache/datafusion/pull/10429) (duongcongtoai)
- Implement a dialect-specific rule for unparsing an identifier with or
without quotes [#10573](https://github.com/apache/datafusion/pull/10573)
(goldmedal)
- add catalog as part of the table path in plan_to_sql
[#10612](https://github.com/apache/datafusion/pull/10612) (y-f-u)
- Refactor parquet row group pruning into a struct (use new statistics API,
part 1) [#10607](https://github.com/apache/datafusion/pull/10607) (alamb)
@@ -205,19 +179,14 @@
- Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce)
[#10662](https://github.com/apache/datafusion/pull/10662) (alamb)
- Add `FileScanConfig::new()` API
[#10623](https://github.com/apache/datafusion/pull/10623) (alamb)
- Minor: Remove `GetFieldAccessSchema`
[#10665](https://github.com/apache/datafusion/pull/10665) (jayzhan211)
-- Minor: Use slice in `ConcreteTreeNode`
[#10666](https://github.com/apache/datafusion/pull/10666) (peter-toth)
- Move Median to `functions-aggregate` and Introduce Numeric signature
[#10644](https://github.com/apache/datafusion/pull/10644) (jayzhan211)
- Fix `Coalesce` casting logic to follows what Postgres and DuckDB do.
Introduce signature that do non-comparison coercion
[#10268](https://github.com/apache/datafusion/pull/10268) (jayzhan211)
-- fix: pass `quote` parameter to CSV writer
[#10671](https://github.com/apache/datafusion/pull/10671) (DDtKey)
- Fix compilation "comparison_binary_numeric_coercion not found"
[#10677](https://github.com/apache/datafusion/pull/10677) (alamb)
- refactor: simplify converting List DataTypes to `ScalarValue`
[#10675](https://github.com/apache/datafusion/pull/10675) (jonahgao)
-- feat: add substrait support for Interval types and literals
[#10646](https://github.com/apache/datafusion/pull/10646) (waynexia)
- Minor: Improve ObjectStoreUrl docs + examples
[#10619](https://github.com/apache/datafusion/pull/10619) (alamb)
-- fix: CI compilation failed on substrait
[#10683](https://github.com/apache/datafusion/pull/10683) (jonahgao)
- Add tests for reading numeric limits in parquet statistics
[#10642](https://github.com/apache/datafusion/pull/10642) (alamb)
- Update nix requirement from 0.28.0 to 0.29.0
[#10684](https://github.com/apache/datafusion/pull/10684) (dependabot[bot])
- refactor: Move SchemaAdapter from parquet module to data source
[#10680](https://github.com/apache/datafusion/pull/10680) (HawaiianSpork)
-- Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()`
and `PhysicalExpr::children()` return references
[#10543](https://github.com/apache/datafusion/pull/10543) (peter-toth)
- Convert first, last aggregate function to UDAF
[#10648](https://github.com/apache/datafusion/pull/10648) (mustafasrepo)
- Minor: CastExpr Ordering Handle
[#10650](https://github.com/apache/datafusion/pull/10650) (berkaysynnada)
- Factor out common datafusion types into another proto file
[#10649](https://github.com/apache/datafusion/pull/10649) (mustafasrepo)
@@ -238,7 +207,6 @@
- Fix incorrect statistics read for unsigned integers columns in parquet
[#10704](https://github.com/apache/datafusion/pull/10704) (xinlifoobar)
- Separate `Partitioning` protobuf serialization code
[#10708](https://github.com/apache/datafusion/pull/10708) (lewiszlw)
- Support consuming Substrait with compound signature function names
[#10653](https://github.com/apache/datafusion/pull/10653) (Blizzara)
-- Minor: Add examples of using TreeNode with `Expr`
[#10686](https://github.com/apache/datafusion/pull/10686) (alamb)
- Minor: Add examples of using TreeNode with `LogicalPlan`
[#10687](https://github.com/apache/datafusion/pull/10687) (alamb)
- Add `ParquetExec::builder()`, deprecate `ParquetExec::new`
[#10636](https://github.com/apache/datafusion/pull/10636) (alamb)
- feature: Add a WindowUDFImpl::simplify() API
[#9906](https://github.com/apache/datafusion/pull/9906) (guojidan)
@@ -249,7 +217,6 @@
- CI: Fix complaints from newer Clippy versions
[#10725](https://github.com/apache/datafusion/pull/10725) (comphead)
- Remove Eager Trait for Joins
[#10721](https://github.com/apache/datafusion/pull/10721) (berkaysynnada)
- Minor: fix signature `fn octect_length()`
[#10726](https://github.com/apache/datafusion/pull/10726) (marvinlanhenke)
-- docs: add documents to substrait type variation consts
[#10719](https://github.com/apache/datafusion/pull/10719) (waynexia)
- Update rstest requirement from 0.19.0 to 0.20.0
[#10734](https://github.com/apache/datafusion/pull/10734) (dependabot[bot])
- Update rstest_reuse requirement from 0.6.0 to 0.7.0
[#10733](https://github.com/apache/datafusion/pull/10733) (dependabot[bot])
- Add example for building an external secondary index for parquet files
[#10549](https://github.com/apache/datafusion/pull/10549) (alamb)
@@ -262,16 +229,12 @@
- Minor: Split physical_plan/parquet/mod.rs into smaller modules
[#10727](https://github.com/apache/datafusion/pull/10727) (alamb)
- minor: consolidate unparser integration tests
[#10736](https://github.com/apache/datafusion/pull/10736) (devinjdangelo)
- Minor: Move aggregate variance to slt
[#10750](https://github.com/apache/datafusion/pull/10750) (marvinlanhenke)
-- fix: fix string repeat for negative numbers
[#10760](https://github.com/apache/datafusion/pull/10760) (tshauck)
-- Introduce Sum UDAF [#10651](https://github.com/apache/datafusion/pull/10651)
(jayzhan211)
- Extract parquet statistics from timestamps with timezones
[#10766](https://github.com/apache/datafusion/pull/10766) (xinlifoobar)
- Minor: Add tests for extracting dictionary parquet statistics
[#10729](https://github.com/apache/datafusion/pull/10729) (alamb)
- Update rstest requirement from 0.20.0 to 0.21.0
[#10774](https://github.com/apache/datafusion/pull/10774) (dependabot[bot])
- Minor: Refactor memory size estimation for HashTable
[#10748](https://github.com/apache/datafusion/pull/10748) (marvinlanhenke)
- Reduce code repetition in `datafusion/functions` mod files
[#10700](https://github.com/apache/datafusion/pull/10700) (MohamedAbdeen21)
-- Minor: (Doc) Enable rt-multi-thread feature for sample code
[#10770](https://github.com/apache/datafusion/pull/10770) (hsiang-c)
- Support negatives in split part
[#10780](https://github.com/apache/datafusion/pull/10780) (tshauck)
-- feat: support unparsing LogicalPlan::Window nodes
[#10767](https://github.com/apache/datafusion/pull/10767) (devinjdangelo)
- Extract parquet statistics from `LargeUtf8` columns and Add tests for `UTF8`
And `LargeUTF8` [#10762](https://github.com/apache/datafusion/pull/10762)
(Weijun-H)
- Cleanup GetIndexedField
[#10769](https://github.com/apache/datafusion/pull/10769) (lewiszlw)
- Extract parquet statistics from f16 columns, add `ScalarValue::Float16`
[#10763](https://github.com/apache/datafusion/pull/10763) (Lordworms)
@@ -284,10 +247,8 @@
- minor: Refactor some unparser methods to improve readability
[#10788](https://github.com/apache/datafusion/pull/10788) (devinjdangelo)
- Convert variance sample to udaf
[#10713](https://github.com/apache/datafusion/pull/10713) (yyin-dev)
- Improve docs and fix a typo
[#10798](https://github.com/apache/datafusion/pull/10798) (lewiszlw)
-- fix: `array_slice` and `array_element` panicked on empty args
[#10804](https://github.com/apache/datafusion/pull/10804) (jonahgao)
- Avoid the usage of intermediate ScalarValue to improve performance of
extracting statistics from parquet files
[#10711](https://github.com/apache/datafusion/pull/10711) (xinlifoobar)
- SMJ: Add more tests and improve comments
[#10784](https://github.com/apache/datafusion/pull/10784) (comphead)
-- feat: Update Parquet row filtering to handle type coercion
[#10716](https://github.com/apache/datafusion/pull/10716) (jeffreyssmith2nd)
- Handle EmptyRelation during SQL unparsing
[#10803](https://github.com/apache/datafusion/pull/10803) (goldmedal)
- Document Committer and PMC process
[#10778](https://github.com/apache/datafusion/pull/10778) (alamb)
- Int64 as default type for make_array function empty or null case
[#10790](https://github.com/apache/datafusion/pull/10790) (jayzhan211)
@@ -301,3 +262,72 @@
- Refactor and simplify the SQL unparser
[#10811](https://github.com/apache/datafusion/pull/10811) (goldmedal)
- Minor: Remove code duplication in `memory_limit` derivation for
datafusion-cli [#10814](https://github.com/apache/datafusion/pull/10814)
(comphead)
- build(deps): update Arrow/Parquet to `52.0`, object-store to `0.10`
[#10765](https://github.com/apache/datafusion/pull/10765) (waynexia)
+- chore: Prepare 39.0.0-rc1
[#10828](https://github.com/apache/datafusion/pull/10828) (andygrove)
+
+## Credits
+
+Thank you to everyone who contributed to this release. Here is a breakdown of
commits (PRs merged) per contributor.
+
+```
+ 44 Andrew Lamb
+ 18 Jay Zhan
+ 14 张林伟
+ 11 Andy Grove
+ 11 Xin Li
+ 10 Jonah Gao
+ 8 Jax Liu
+ 7 Mustafa Akur
+ 7 Oleks V
+ 7 dependabot[bot]
+ 5 Arttu
+ 5 Berkay Şahin
+ 5 Marvin Lanhenke
+ 4 Lordworms
+ 4 Ruihang Xia
+ 3 Bruce Ritchie
+ 3 Devin D'Angelo
+ 3 Duong Cong Toai
+ 3 Eduard Karacharov
+ 3 Junhao Liu
+ 3 Liang-Chi Hsieh
+ 3 Mohamed Abdeen
+ 3 Nga Tran
+ 3 Peter Toth
+ 3 Phillip LeBlanc
+ 2 Abrar Khan
+ 2 Adam Curtis
+ 2 Chunchun Ye
+ 2 Jeffrey Vo
+ 2 Michael Maletich
+ 2 QP Hou
+ 2 Trent Hauck
+ 2 Weijie Guo
+ 2 junxiangMu
+ 2 yfu
+ 1 Adrian Tanase
+ 1 Alex Huang
+ 1 Andrey Koshchiy
+ 1 Artem Medvedev
+ 1 ClSlaid
+ 1 Dan Harris
+ 1 Edmondo Porcu
+ 1 Jeffrey Smith II
+ 1 Kun Liu
+ 1 Leonardo Yvens
+ 1 Marko Milenković
+ 1 Matthew Turner
+ 1 Mehmet Ozan Kabak
+ 1 Michael J Ward
+ 1 NoeB
+ 1 Samuel Colvin
+ 1 Scott Anderson
+ 1 VimT
+ 1 Yue Yin
+ 1 baishen
+ 1 hsiang-c
+ 1 nathaniel-daniel
+ 1 shanretoo
+ 1 tison
+```
+
+Thank you also to everyone who contributed in other ways such as filing
issues, reviewing PRs, and providing feedback on this release.
diff --git a/dev/release/README.md b/dev/release/README.md
index 749af8696b..c0ba87ad39 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -57,7 +57,7 @@ See instructions at
https://infra.apache.org/release-signing.html#generate for g
Committers can add signing keys in Subversion client with their ASF account.
e.g.:
-```bash
+```shell
$ svn co https://dist.apache.org/repos/dist/dev/datafusion
$ cd datafusion
$ editor KEYS
@@ -66,7 +66,7 @@ $ svn ci KEYS
Follow the instructions in the header of the KEYS file to append your key.
Here is an example:
-```bash
+```shell
(gpg --list-sigs "John Doe" && gpg --armor --export "John Doe") >> KEYS
svn commit KEYS -m "Add key for John Doe"
```
@@ -89,35 +89,26 @@ to generate one if you do not already have one.
The changelog is generated using a Python script. There is a dependency on
`PyGitHub`, which can be installed using pip:
-```bash
+```shell
pip3 install PyGitHub
```
-Run the following command to generate the changelog content.
+To generate the changelog, set the `GITHUB_TOKEN` environment variable to a
valid token and then run the script
+providing two commit ids or tags followed by the version number of the release
being created. The following
+example generates a change log of all changes between the first commit and the
current HEAD revision.
-```bash
-$ GITHUB_TOKEN=<TOKEN> ./dev/release/generate-changelog.py 24.0.0 HEAD >
dev/changelog/25.0.0.md
+```shell
+export GITHUB_TOKEN=<your-token-here>
+./dev/release/generate-changelog.py 24.0.0 HEAD 25.0.0 >
dev/changelog/25.0.0.md
```
This script creates a changelog from GitHub PRs based on the labels associated
with them as well as looking for
-titles starting with `feat:`, `fix:`, or `docs:` . The script will produce
output similar to:
-
-```
-Fetching list of commits between 24.0.0 and HEAD
-Fetching pull requests
-Categorizing pull requests
-Generating changelog content
-```
-
-This process is not fully automated, so there are some additional manual steps:
+titles starting with `feat:`, `fix:`, or `docs:`.
-- Add the ASF header to the generated file
-- Add the following content (copy from the previous version's changelog and
update as appropriate:
-
-```
-## [24.0.0](https://github.com/apache/datafusion/tree/24.0.0) (2023-05-06)
+Once the change log is generated, run `prettier` to format the document:
-[Full Changelog](https://github.com/apache/datafusion/compare/23.0.0...24.0.0)
+```shell
+prettier -w dev/changelog/25.0.0md
```
## Prepare release commits and PR
@@ -265,7 +256,7 @@ published in the correct order as shown in this diagram.
_To update this diagram, manually edit the dependencies in
[crate-deps.dot](crate-deps.dot) and then run:_
-```bash
+```shell
dot -Tsvg dev/release/crate-deps.dot > dev/release/crate-deps.svg
```
@@ -310,7 +301,7 @@ Please visit https://brew.sh/ to obtain Homebrew. In
addition to that please che
Before running the script make sure that you can run the following command in
your bash to make sure
that `brew` has been installed and configured properly:
-```bash
+```shell
brew --version
```
@@ -325,7 +316,7 @@ To create a Github Personal Access Token, please visit
https://docs.github.com/e
After all of the above is complete execute the following command:
-```bash
+```shell
dev/release/publish_homebrew.sh <version> <github-user> <github-token>
<homebrew-default-branch-name>
```
@@ -368,13 +359,13 @@ Release candidates should be deleted once the release is
published.
Get a list of DataFusion release candidates:
-```bash
+```shell
svn ls https://dist.apache.org/repos/dist/dev/datafusion
```
Delete a release candidate:
-```bash
+```shell
svn delete -m "delete old DataFusion RC"
https://dist.apache.org/repos/dist/dev/datafusion/apache-datafusion-38.0.0-rc1/
```
@@ -384,13 +375,13 @@ Only the latest release should be available. Delete old
releases after publishin
Get a list of DataFusion releases:
-```bash
+```shell
svn ls https://dist.apache.org/repos/dist/release/datafusion
```
Delete a release:
-```bash
+```shell
svn delete -m "delete old DataFusion release"
https://dist.apache.org/repos/dist/release/datafusion/datafusion-37.0.0
```
@@ -401,7 +392,7 @@ with a copy of the previous release announcement.
Run the following commands to get the number of commits and number of unique
contributors for inclusion in the blog post.
-```bash
+```shell
git log --pretty=oneline 37.0.0..38.0.0 datafusion datafusion-cli
datafusion-examples | wc -l
git shortlog -sn 37.0.0..38.0.0 datafusion datafusion-cli datafusion-examples
| wc -l
```
diff --git a/dev/release/generate-changelog.py
b/dev/release/generate-changelog.py
index 424baece60..23b5942148 100755
--- a/dev/release/generate-changelog.py
+++ b/dev/release/generate-changelog.py
@@ -20,7 +20,7 @@ import sys
from github import Github
import os
import re
-
+import subprocess
def print_pulls(repo_name, title, pulls):
if len(pulls) > 0:
@@ -32,7 +32,7 @@ def print_pulls(repo_name, title, pulls):
print()
-def generate_changelog(repo, repo_name, tag1, tag2):
+def generate_changelog(repo, repo_name, tag1, tag2, version):
# get a list of commits between two tags
print(f"Fetching list of commits between {tag1} and {tag2}",
file=sys.stderr)
@@ -52,12 +52,12 @@ def generate_changelog(repo, repo_name, tag1, tag2):
all_pulls.append((pull, commit))
# we split the pulls into categories
- #TODO: make categories configurable
breaking = []
bugs = []
docs = []
enhancements = []
performance = []
+ other = []
# categorize the pull requests based on GitHub labels
print("Categorizing pull requests", file=sys.stderr)
@@ -75,7 +75,6 @@ def generate_changelog(repo, repo_name, tag1, tag2):
cc_breaking = parts_tuple[2] == '!'
labels = [label.name for label in pull.labels]
- #print(pull.number, labels, parts, file=sys.stderr)
if 'api change' in labels or cc_breaking:
breaking.append((pull, commit))
elif 'bug' in labels or cc_type == 'fix':
@@ -84,18 +83,64 @@ def generate_changelog(repo, repo_name, tag1, tag2):
performance.append((pull, commit))
elif 'enhancement' in labels or cc_type == 'feat':
enhancements.append((pull, commit))
- elif 'documentation' in labels or cc_type == 'docs':
+ elif 'documentation' in labels or cc_type == 'docs' or cc_type ==
'doc':
docs.append((pull, commit))
+ else:
+ other.append((pull, commit))
# produce the changelog content
print("Generating changelog content", file=sys.stderr)
+
+ # ASF header
+ print("""<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->\n""")
+
+ print(f"# Apache DataFusion {version} Changelog\n")
+
+ # get the number of commits
+ commit_count = subprocess.check_output(f"git log --pretty=oneline
{tag1}..{tag2} | wc -l", shell=True, text=True).strip()
+
+ # get number of contributors
+ contributor_count = subprocess.check_output(f"git shortlog -sn
{tag1}..{tag2} | wc -l", shell=True, text=True).strip()
+
+ print(f"This release consists of {commit_count} commits from
{contributor_count} contributors. "
+ f"See credits at the end of this changelog for more information.\n")
+
print_pulls(repo_name, "Breaking changes", breaking)
print_pulls(repo_name, "Performance related", performance)
print_pulls(repo_name, "Implemented enhancements", enhancements)
print_pulls(repo_name, "Fixed bugs", bugs)
print_pulls(repo_name, "Documentation updates", docs)
- print_pulls(repo_name, "Merged pull requests", all_pulls)
+ print_pulls(repo_name, "Other", other)
+
+ # show code contributions
+ credits = subprocess.check_output(f"git shortlog -sn {tag1}..{tag2}",
shell=True, text=True).rstrip()
+
+ print("## Credits\n")
+ print("Thank you to everyone who contributed to this release. Here is a
breakdown of commits (PRs merged) "
+ "per contributor.\n")
+ print("```")
+ print(credits)
+ print("```\n")
+ print("Thank you also to everyone who contributed in other ways such as
filing issues, reviewing "
+ "PRs, and providing feedback on this release.\n")
def cli(args=None):
"""Process command line arguments."""
@@ -103,8 +148,9 @@ def cli(args=None):
args = sys.argv[1:]
parser = argparse.ArgumentParser()
- parser.add_argument("tag1", help="The previous release tag (e.g. 38.0.0)")
- parser.add_argument("tag2", help="The current release tag (e.g. HEAD)")
+ parser.add_argument("tag1", help="The previous commit or tag (e.g. 0.1.0)")
+ parser.add_argument("tag2", help="The current commit or tag (e.g. HEAD)")
+ parser.add_argument("version", help="The version number to include in the
changelog")
args = parser.parse_args()
token = os.getenv("GITHUB_TOKEN")
@@ -112,7 +158,7 @@ def cli(args=None):
g = Github(token)
repo = g.get_repo(project)
- generate_changelog(repo, project, args.tag1, args.tag2)
+ generate_changelog(repo, project, args.tag1, args.tag2, args.version)
if __name__ == "__main__":
cli()
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]