AlenkaF commented on code in PR #768:
URL: https://github.com/apache/arrow-site/pull/768#discussion_r2909417106


##########
_posts/2026-03-10-arrow-2025-highlights.md:
##########
@@ -0,0 +1,282 @@
+---
+layout: post
+title: "Community Highlights 2025"
+date: "2026-03-04 00:00:00"
+author: pmc
+categories: [arrow]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+As you may have read in a previous blog post [^1], the Apache Arrow project
+recently turned 10 years old. We are grateful to everyone who helped us
+achieve this milestone, and we wanted to celebrate the community's
+accomplishments, by publishing our community highlights from 2025.
+
+We were inspired by the research by Dr Cat Hicks et al [^2], who found
+that concrete evidence of progress and accomplishments is instrumental
+to motivation and collaboration in developer teams. We think the same
+should hold for open source.
+
+---
+
+## New contributors
+
+It has been great to see many new contributors joining the project
+in the past year, with over 300 such individuals observed across the main
+Apache Arrow language implementations.
+
+| Repository/Implementation | Number of new contributors |
+|---|---|
+| arrow | 125 |
+| arrow-rust | 131 |
+| arrow-java | 28 |
+| arrow-go | 35 |
+<br>
+
+Worth highlighting is [alinaliBQ](https://github.com/alinaliBQ) who
+has been very active on the C++ Flight SQL ODBC Driver work together
+with [justing-bq](https://github.com/justing-bq) .
+
+[AntoinePrv](https://github.com/AntoinePrv) has done huge amount of
+work on the C++ Parquet implementation and 
[andishgar](https://github.com/andishgar)
+in the C++ Statistics area.
+
+[rmnskb](https://github.com/rmnskb) got involved with PyArrow in
+EuroPython sprints and has contributed multiple PRs since then. On the
+same event [paddyroddy](https://github.com/paddyroddy) also started with
+his first contribution and helped on the Python packaging side further on.
+
+<!-- TODO: Add Rust and Go new contribs content -->
+
+#### Notable New Contributors in apache/arrow for 2025 are:
+
+| Author | # of prs | # of line changes (+ and -) |
+|---|---|---|
+| alinaliBQ | 36 | 15754 |
+| andishgar | 19 | 2926 |
+| AntoinePrv | 8 | 79257 |
+| rmnskb | 7 | 550 |
+| justing-bq | 4 | 12607 |
+
+#### Notable New Contributors in apache/arrow-rs for 2025 are:
+
+| Author | # of prs | # of line changes (+ and -) |
+|---|---|---|
+| scovich | 50 | 21006 |
+| jecsand838 | 38 | 26753 |
+| friendlymatthew | 33 | 7203 |
+| rambleraptor | 4 | 333 |
+| sdf-jkl | 4 | 388 |
+
+#### Notable New Contributors in apache/arrow-go for 2025 are:
+
+| Author | # of prs | # of line changes (+ and -) |
+|---|---|---|
+| Mandukhai-Alimaa | 6 | 1392 |
+| hamilton-earthscope | 5 | 2998 |
+
+
+## Release, Packaging and CI
+
+A lot of work has been done around the Continuous Integration and
+Developer Tools area. Ensuring a project with the reach of Arrow is properly 
working
+requires validation on a huge matrix of operating systems, architectures, 
libraries,
+versions. Needless to say that maintenance work has tremendous importance for 
the
+health of the project and the positive contributor experience.
+
+The most active contributors in the main repository are the ones contributing
+heavily on those areas while also providing the most review capacity. Shout out
+to [kou](https://github.com/kou) and [raulcd](https://github.com/raulcd) for
+taking such good care of the project and devoting countless hours so that 
everything
+runs smoothly.
+
+Notable contributions worth mentioning are enhanced release automation and
+reproducible builds for sources, migrating remaining AppVeyor and Azure jobs
+to GitHub actions, improving dev experience with more pre-commit checks instead
+of custom made linting tools.
+
+Moving some implementations out of the main repository (apache/arrow on GitHub)
+helped with easier releases and maintenance of the main repository and also of
+separate language implementations. The current apache/arrow repo now holds the 
format
+specification, C++ implementation together with all the bindings to it 
(Python, R, Ruby
+and C GLib). Other languages now live in their own apache/ repos namely
+[apache/arrow-java](https://github.com/apache/arrow-java),
+[apache/arrow-js](https://github.com/apache/arrow-js),
+[apache/arrow-rs](https://github.com/apache/arrow-rs),
+[apache/arrow-go](https://github.com/apache/arrow-go),
+[apache/arrow-nanoarrow](https://github.com/apache/arrow-nanoarrow),
+[apache/arrow-dotnet](https://github.com/apache/arrow-dotnet) and
+[apache/arrow-swift](https://github.com/apache/arrow-swift).
+
+#### Notable Contributors in apache/arrow for 2025 are:
+
+| Author | # of prs | # of line changes (+ and -) |
+|---|---|---|
+| kou | 221 | 141015 |
+| AntoinePrv | 8 | 79257 |
+| raulcd | 110 | 46645 |
+| pitrou | 101 | 36585 |
+| jbonofre | 1 | 20061 |
+
+
+#### Notable Components in apache/arrow for 2025 are:
+
+| Component label | # of merged prs | # of line changes (+ and -) |
+|---|---|---|
+| Parquet | 100 | 103828 |
+| C++ | 387 | 82744 |
+| FlightRPC | 43 | 52659 |
+| CI | 237 | 42249 |
+| Ruby | 74 | 20676 |
+
+
+## Migration of infrastructure from Voltron Data
+
+As Voltron Data has wound down its operations in 2025, the Arrow project
+had to migrate benchmarking infrastructure and nightly report from
+Voltron-managed services to an Arrow-managed AWS account. This work has been
+driven by [rok](https://github.com/rok).
+
+## Closing of Stale issues
+
+[thisisnic](https://github.com/thisisnic) was working on closing of stale
+issues in the apache/arrow repository which helped surfacing important
+issues that were overlooked or forgotten.
+
+## Code contributions
+
+### C++ implementation
+
+Community support for maintenance and development of the Acero C++
+is continuing with multiple bigger contributions in 2025 done by
+[pitrou](https://github.com/pitrou) and 
[zanmato1984](https://github.com/zanmato1984).
+
+Many kernels have been moved from the integrated compute module into
+a separate, optional package for improvement of modularity and distribution
+size when optional compute functionality is not being used. The work has
+been done by [raulcd](https://github.com/raulcd).
+
+### Arrow C++ Parquet implementation
+
+There have been multiple contributions to fix and improve fuzzing
+support for Parquet. Fuzzing work is led by [pitrou](https://github.com/pitrou)
+who is also one of the most active members of the community guiding other
+developers and supporting us with abundant review capacity.
+
+Multiple newer types have also been supported in the last year,
+namely: VARIANT, UUID, GEOMETRY and GEOGRAPHY contributed
+by [neilechao](https://github.com/neilechao) and
+[paleolimbot](https://github.com/paleolimbot).
+
+An important feature added has also been Content-Defined Chunking
+which improves deduplication of Parquet files with mostly identical
+contents, by choosing data page boundaries based on actual contents
+rather than a number of values [^3]. This work has been done by
+[kszucs](https://github.com/kszucs).
+
+There have been improvements in the Parquet encryption support for
+most of the releases in the last year. These efforts have been
+driven mostly by [EnricoMi](https://github.com/EnricoMi),
+[pitrou](https://github.com/pitrou), [adamreeve](https://github.com/adamreeve)
+and [kapoisu](https://github.com/kapoisu).
+
+### PyArrow
+
+A lot of work has been put into adding type annotations. It all
+started in July at EuroPython sprints and the code is now ready to be 
+reviewed and merged. Some more review capacity will be needed to get
+this over the finish line. The work has been championed by
+[rok](https://github.com/rok).
+
+### Rust
+
+Arrow Rust community invested heavily in the Rust parquet reader for
+which they created several blog posts [^4], [^5]. The work has been
+championed by [alamb](https://github.com/alamb) and
+[etseidl](https://github.com/etseidl).
+
+#### Notable Components in apache/arrow-rs for 2025 are:
+
+| component | merged_prs | line_changes |
+|---|---|---|
+| parquet | 333 | 140958 |
+| arrow | 436 | 76590 |
+| parquet-variant | 125 | 41832 |
+| api-change | 59 | 33938 |
+| arrow-avro | 48 | 29487 |

Review Comment:
   I look through merged PRs and group by labels (if used) and title 
prefix/commit style type. For Go this was a bit harder to do as the labels are 
not used for the components, most of the information I got from the commits, 
see: 
https://github.com/arrow-maintenance/explorations/blob/main/yearly_highlights/github_repo_report_arrow_go.ipynb.
   
   The current output would thus be:
   
   component | merged_prs | line_changes | top_pr_1 | top_pr_2 | top_pr_3
   -- | -- | -- | -- | -- | --
   "parquet" | 34 | 27056 | "refactor(parquet/internal/enco… | 
"feat(parquet/schema): initial … | "feat(parquet): add variant enc…
   "arrow" | 33 | 14235 | "feat(arrow/extensions): Add Va… | 
"new(arrow/compute): temporal r… | "refactor(arrow/array): replace…
   "(unlabeled)" | 4 | 2662 | "Implement RLE dictionary decod… | "Batch of 
small optimizations" | "use xnor for boolean equals fu…
   "format" | 1 | 2044 | "format: regenerate internal/fl… | null | null
   "fix" | 8 | 1981 | "fix: move from atomic.(Add\|Loa… | "fix: correctly 
initialize Sche… | "Fix: Handle null values in Pla…
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to