Here is the final report that was submitted to the board[1]. The full text
is also below.

[1]:
https://github.com/apache/datafusion/issues/10282#issuecomment-2221351274

------


2024-07-10 DataFusion ASF Board Report
https://github.com/apache/datafusion/issues/10282

DataFusion PMC Chair Note: Please add any relevant comments / content to
this document. I (Andrew Lamb) will submit to the ASF board on Wed July 10,
2024 (about one week prior to the scheduled board meeting).

New projects submit reports every month for the first three months. This is
our last of those three monthly reports. Our next one is due on July 10,
2024

The format of this report and the metrics are from
https://reporter.apache.org/wizard/?datafusion

The rationale and process for this report:
https://www.apache.org/foundation/board/reporting
Past examples: [2024-06-12 DataFusion ASF Board Report](
https://docs.google.com/document/d/1h4yjvomQO0XdzxKuE4aBSWGNliFFmn8GADd8DlPuXBw/edit
)



## Description:
The mission of Apache DataFusion is the creation and maintenance of
software
related to an extensible query engine

## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None


## Membership Data:
Apache DataFusion was founded 2024-04-16 (3 months ago)
There are currently 33 committers and 13 PMC members in this project.
The Committer-to-PMC ratio is roughly 9:4.

Community changes, past month:
- Mehmet Ozan Kabak was added to the PMC on 2024-06-12
- Ruihang Xia was added to the PMC on 2024-06-12
- Lewis Zhang was added as committer on 2024-06-14


## Project Activity:

The project continues to be quite active with many PRs and issues opened and
closed per day.

We started working on a project blog [1] (previously we used the arrow blog)
and hope to have our first blog post as an independent project later this
month.

There was a well attended face to face meetup in San Francisco, CA USA in
June
[2]. We have one planned for Hangzhou, China in July[3]. There appears
 significant interest in these events and there are at least 2 more planned
 for September in New York, NY USA and in Belgrade, Serbia

The community around DataFusion is growing too. For example, Spice AI has
made
an initial contribution of TableProviders to datafusion-contrib [4] for
PostgreSQL, MySQL, DuckDB, and SQLite, enabling these data sources to be
easily queried through DataFusion.

[1]: https://datafusion.apache.org/blog/
[2]: https://github.com/apache/datafusion/discussions/10800
[3]: https://github.com/apache/datafusion/discussions/10341
#discussioncomment-9738748
[4]: https://github.com/datafusion-contrib/datafusion-table-providers

### DataFusion core
https://github.com/apache/datafusion

We released version 39.0.0, continuing our schedule of monthly releases and
are on track to release version 40.0.0 in the next day or two.

Some projects we have been working on recently involve adding support for
more
flexible use of Parquet files including indexing and extracting statistics.
We
are also working with the community to make extending SQL planning[2] easier
and extending file format support[3], as well as fixing bugs found with a
SQL
fuzzer[4], and improving performance with StringView[5].

It has been nice to see several good examples of cross contributor/company
collaboration such as [6] and [7].

We have also been making external presentations[1]

[1]: https://github.com/apache/datafusion/issues/10969
[2]: https://github.com/apache/datafusion/issues/10534
[3]: https://github.com/apache/datafusion/pull/11060
[4]: https://github.com/apache/datafusion/issues/11030
[5]: https://github.com/apache/datafusion/issues/10918
[6]: https://github.com/apache/datafusion/pull/11203
[7]: https://github.com/apache/datafusion/issues/10534

### Sub project: DataFusion Python

https://github.com/apache/datafusion-python

The DataFusion Python project continues to receive updates as new versions
of
the core DataFusion project are released. There have also been some minor
improvements to improve user experience.


### Sub project: DataFusion Comet

https://github.com/apache/datafusion-comet

The Comet project is very active and is working towards an initial 0.1.0
source release. Initial benchmark results were published to
https://datafusion.apache.org/comet/contributor-guide/benchmarking.html.


### Sub project: DataFusion Ballista
https://github.com/apache/datafusion-ballista
https://github.com/apache/datafusion-ballista-python

The Ballista subproject is not very actively maintained, but there have been
some contributions recently to upgrade to more recent versions of the core
DataFusion project.

### Recent Releases
* PYTHON-39.0.0 was released on 2024-07-02.
* 39.0.0 was released on 2024-06-10.
* PYTHON-38.0.1 was released on 2024-05-30.
* PYTHON-37.1.0 was released on 2024-05-13.
* 38.0.0 was released on 2024-05-10.


## Community Health:
Community health is good -- we recently hit the 600 total contributors mark
according to Github. This number is partially inflated from initially
being part of the Arrow mono repo but the trend is healthy non the less.

It is hard to keep track of everything going on these days, which is a good
thing. While it is always a struggle to get enough code review, the
committers keep things going and the community helps each other out with
reviews.

On Tue, Jul 2, 2024 at 1:49 PM Andrew Lamb <andrewlam...@gmail.com> wrote:

> It is time again for our monthly ASF board report.
>
> This month marks the last of the monthly reports required of new top level
> projects, so this activity will happen less frequently after this month.
>
> Please provide your comments on the ticket[1], google doc[2] or reply to
> this email
>
> I plan to submit this to the board on July 10
>
> Thanks,
> Andrew
>
> [1]: https://github.com/apache/datafusion/issues/10282
> [2]:
> https://docs.google.com/document/d/1lV-cFZGHCSrTiaLW1gyEMDKW-9nf47UW8xK19QCqbVk/edit
>
>
>
> ----
>
>
> ## Description:
> The mission of Apache DataFusion is the creation and maintenance of
> software
> related to an extensible query engine
>
> ## Project Status:
> Current project status: New + Ongoing (high activity)
> Issues for the board: None
>
>
> ## Membership Data:
> Apache DataFusion was founded 2024-04-16 (3 months ago)
> There are currently 33 committers and 13 PMC members in this project.
> The Committer-to-PMC ratio is roughly 9:4.
>
> Community changes, past month:
> - Mehmet Ozan Kabak was added to the PMC on 2024-06-12
> - Ruihang Xia was added to the PMC on 2024-06-12
> - Lewis Zhang was added as committer on 2024-06-14
>
>
>
> ## Project Activity:
>
> The project continues to be quite active with many PRs and issues opened
> and
> closed per day.
>
> We started working on a project blog [1] (previously we used the arrow
> blog) and hope to have our first blog post as an independent project later
> this month.
>
> There was a well attended face to face meetup in San Francisco, CA USA in
> June [2]. We have one planned for Hangzhou, China in July[3]. There appears
> significant interest for this and there are more planned
>
> [1]: https://datafusion.apache.org/blog/
> [2]: https://github.com/apache/datafusion/discussions/10800
> [3]:
> https://github.com/apache/datafusion/discussions/10341#discussioncomment-9738748
>
> ### DataFusion core
> https://github.com/apache/datafusion
>
> We released version 39.0.0, continuing our schedule of monthly releases
>
> Some projects we have been working on recently on more flexible use of
> Parquet files including indexing and extracting statistics. We are also
> working with the community to make extending SQL planning[2] easier and
> extending file format support[3], as well as fixing bugs found with a SQL
> fuzzer[4], and improving performance with StringView[5]
>
> It has been nice to see several good examples of cross contributor/company
> collaboration such as [6] and [7].
>
> We have also been making external presentations[1]
>
> [1]: https://github.com/apache/datafusion/issues/10969
> [2]: https://github.com/apache/datafusion/issues/10534
> [3]: https://github.com/apache/datafusion/pull/11060
> [4]: https://github.com/apache/datafusion/issues/11030
> [5]: https://github.com/apache/datafusion/issues/10918
> [6]: https://github.com/apache/datafusion/pull/11203
> [7]: https://github.com/apache/datafusion/issues/10534
>
> ### Sub project: DataFusion Python
> https://github.com/apache/datafusion-python
>
> TODO
>
>
> ### Sub project: DataFusion Comet
> https://github.com/apache/datafusion-comet
>
> TODO
>
>
> ### Sub project: DataFusion Ballista
> https://github.com/apache/datafusion-ballista
> https://github.com/apache/datafusion-ballista-python
>
> The Ballista subproject is not currently actively maintained.
>
> ### Recent Releases
> * PYTHON-38.0.1 was released on 2024-05-30.
> * PYTHON-37.1.0 was released on 2024-05-13.
> * 38.0.0 was released on 2024-05-10.
>
> ## Community Health:
>
> Most of our communications still happen through github.
>
> We are also discussing [1] adopting a library that makes it easier to
> write UDF code
>
> [1]: https://github.com/apache/datafusion/discussions/11192
>
> TODO update
>
> * dev@datafusion.apache.org had a big increase in traffic in the past
> quarter
>   (71 emails compared to 0)
> * git...@datafusion.apache.org had a big increase in traffic in the past
>   quarter (7685 emails compared to 0)
>
>
>

Reply via email to