This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 98079059a821 docs: Update contribution guide with changes due to JIRA
to GH issues migration (#13921)
98079059a821 is described below
commit 98079059a8215e4ded399c6b39443a3ca228aa8e
Author: vinoth chandar <[email protected]>
AuthorDate: Wed Sep 17 17:41:38 2025 -0700
docs: Update contribution guide with changes due to JIRA to GH issues
migration (#13921)
---
website/community/get-involved.mdx | 11 +-
website/contribute/developer-setup.md | 209 +--------------------
website/contribute/how-to-contribute.md | 312 +++++++++++++++++++++++++++++---
website/docs/performance.md | 2 +-
4 files changed, 297 insertions(+), 237 deletions(-)
diff --git a/website/community/get-involved.mdx
b/website/community/get-involved.mdx
index 3eb1bca997c0..ad051a028c06 100644
--- a/website/community/get-involved.mdx
+++ b/website/community/get-involved.mdx
@@ -13,13 +13,12 @@ There are several ways to get in touch with the Hudi
community.
| When? | Channel to use
|
|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| For development discussions | Dev Mailing list
([Subscribe](mailto:[email protected]),
[Unsubscribe](mailto:[email protected]),
[Archives](https://lists.apache.org/[email protected])). Empty email
works for subscribe/unsubscribe. Please use [gists](https://gist.github.com) to
share code/stacktraces on the email. |
-| For any general questions, user support | Users Mailing list
([Subscribe](mailto:[email protected]),
[Unsubscribe](mailto:[email protected]),
[Archives](https://lists.apache.org/[email protected])). Empty email
works for subscribe/unsubscribe. Please use [gists](https://gist.github.com) to
share code/stacktraces on the email. |
-| For reporting bugs or issues or discover known issues | Use [ASF
self-service](https://selfserve.apache.org/jira-account.html) to request access
to the [Hudi JIRA
project](https://issues.apache.org/jira/projects/HUDI/summary).
|
-| For quick pings & 1-1 chats | Join our
<SlackCommunity title="slack group" />. In case the link does not work, please
leave a comment on this [github
issue](https://github.com/apache/hudi/issues/143) or drop an email to
[email protected] |
+| For development discussions | [Github
Discussions](https://github.com/apache/hudi/discussions) or Dev Mailing list
([Subscribe](mailto:[email protected]),
[Unsubscribe](mailto:[email protected]),
[Archives](https://lists.apache.org/[email protected])). Empty email
works for subscribe/unsubscribe. |
+| For any general questions, user support | [Github
Discussions](https://github.com/apache/hudi/discussions) or Users Mailing list
([Subscribe](mailto:[email protected]),
[Unsubscribe](mailto:[email protected]),
[Archives](https://lists.apache.org/[email protected])). Empty email
works for subscribe/unsubscribe. |
+| For reporting bugs or issues or discover known issues | Use Github
[Issues](https://github.com/apache/hudi/issues), please read guidelines
[here](/contribute/how-to-contribute#filing-issues)
|
+| For quick pings & 1-1 chats | Join our
<SlackCommunity title="slack group" />. In case the link does not work, please
start a GH discussion or file a community support issue or drop an email to
[email protected] |
| For proposing large features, changes | Start a RFC.
Instructions [here](/contribute/rfc-process).
|
-| Join weekly sync-up meeting | Follow instructions
[here](https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Community+Weekly+Sync).
|
-| See [#here](#accounts) for wiki access |
|
+| Join sync-up meetings | [Community
sync](/community/syncs) and [Dev Sync](/contribute/developer-sync-call).
|
| For stream of commits, pull requests etc | Commits Mailing list
([Subscribe](mailto:[email protected]),
[Unsubscribe](mailto:[email protected]),
[Archives](https://lists.apache.org/[email protected]))
|
If you wish to report a security vulnerability, please contact
[[email protected]](mailto:[email protected]).
diff --git a/website/contribute/developer-setup.md
b/website/contribute/developer-setup.md
index 5065ac75f99a..215ae54aa4d4 100644
--- a/website/contribute/developer-setup.md
+++ b/website/contribute/developer-setup.md
@@ -13,10 +13,7 @@ To contribute code, you need
- a GitHub account
- a Linux (or) macOS development environment with Java JDK 8, Apache Maven
(3.x+) installed
- [Docker](https://www.docker.com/) installed for running demo, integ tests
or building website
- - for large contributions, a signed [Individual Contributor License
- Agreement](https://www.apache.org/licenses/icla.pdf) (ICLA) to the Apache
- Software Foundation (ASF).
- - (Recommended) Create an account on
[JIRA](https://issues.apache.org/jira/projects/HUDI/summary) to open
issues/find similar issues.
+ - for large contributions, a signed [Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.pdf) (ICLA) to the Apache
Software Foundation (ASF).
- (Recommended) Join our dev mailing list & slack channel, listed on
[community](/community/get-involved) page.
@@ -75,193 +72,8 @@ so that IntelliJ re-indexes the code.


-## Accounts and Permissions
-
- - [Hudi issue tracker
(JIRA)](https://issues.apache.org/jira/projects/HUDI/issues):
- Anyone can access it and browse issues. Anyone can register an account and
login
- to create issues or add comments. Only contributors can be assigned issues.
If
- you want to be assigned issues, a PMC member can add you to the project
contributor
- group. Email the dev mailing list to ask to be added as a contributor, and
include your ASF Jira username.
-
- - [Hudi Wiki Space](https://cwiki.apache.org/confluence/display/HUDI):
- Anyone has read access. If you wish to contribute changes, please create an
account and
- request edit access on the dev@ mailing list (include your Wiki account
user ID).
-
- - Pull requests can only be merged by a HUDI committer, listed
[here](https://incubator.apache.org/projects/hudi)
-
- - [Voting on a release](https://www.apache.org/foundation/voting): Everyone
can vote.
- Only Hudi PMC members should mark their votes as binding.
-
-## Life of a Contributor
-
-This document details processes and procedures we follow to make contributions
to the project and take it forward.
-If you are looking to ramp up into the project as a contributor, we highly
encourage you to read this guide in full, familiarize yourself with the
workflow
-and more importantly also try to improve the process along the way as well.
-
-### Filing JIRAs
-
- - Hudi uses JIRA to manage issues. First, familiarize yourself with the
various [components](https://issues.apache.org/jira/projects/HUDI/components)
against which issues are filed in Hudi.
- - Make an attempt to find an existing JIRA, that may solve the same issue you
are reporting. When in doubt, you can always email the mailing list so that the
community can provide early feedback,
- point out any similar JIRAs or RFCs.
- - Try to gauge whether this JIRA needs an
[RFC](https://cwiki.apache.org/confluence/display/HUDI/RFC+Process). As always,
email the mailing list if unsure. If you need an RFC since the change is
- large in scope, then please follow the wiki instructions to get the process
rolling along.
- - While raising a new JIRA or updating an existing one, please make sure to
do the following
- - The issue `type` and `components` (when resolving the ticket) are set
correctly
- - If you intend to target the JIRA for a specific release, please fill
in the `fix version(s)` field, with the [release
number](https://issues.apache.org/jira/projects/HUDI/releases).
- - Summary should be descriptive enough to catch the essence of the
problem/ feature
- - Where necessary, capture the version of Hudi/Spark/Hive/Hadoop/Cloud
environments in the ticket
- - Whenever possible, provide steps to reproduce via sample code or on
the [docker setup](https://hudi.apache.org/docker_demo)
- - All newly filed JIRAs are placed in the `NEW` state. If you are sure about
this JIRA representing valid, scoped piece of work, please click `Accept Issue`
to move it `OPEN` state
- - If you are not sure, please wait for a PMC/Committer to confirm/triage the
issue and accept it. This process avoids contributors spending time on JIRAs
with unclear scope.
- - Whenever possible, break down large JIRAs (e.g JIRAs resulting from an
[RFC](https://cwiki.apache.org/confluence/display/HUDI/RFC+Process)) into `sub
tasks` by clicking `More > create sub-task` from the parent JIRA ,
- so that the community can contribute at large and help implement it much
quickly. We recommend prefixing such JIRA titles with `[UMBRELLA]`
-
-### Claiming JIRAs
-
- - Finding a JIRA to work on
- - If you are new to the project, you can ramp up by picking up any
issues tagged with the
[newbie](https://issues.apache.org/jira/issues/?jql=project+%3D+HUDI+AND+component+%3D+newbie)
component.
- - If you want to work on some higher priority issue, then scout for Open
issues against the next release on the JIRA, engage on unassigned/inactive
JIRAs and offer help.
- - Issues tagged with `Usability` , `Code Cleanup`, `Testing` components
often present excellent opportunities to make a great impact.
- - If you don't have perms to self-assign JIRAs, please email the dev mailing
list with your JIRA id and a small intro for yourself. We'd be happy to add you
as a contributor.
- - As courtesy, if you are unable to continue working on a JIRA, please move
it back to "OPEN" state and un-assign yourself.
- - If a JIRA or its corresponding pull request has been inactive for a
week, awaiting feedback from you, PMC/Committers could choose to re-assign them
to another contributor.
- - Such re-assignment process would be communicated over JIRA/GitHub
comments, checking with the original contributor on his/her intent to continue
working on the issue.
- - You can also contribute by helping others contribute. So, if you don't
have cycles to work on a JIRA and another contributor offers help, take it!
-
-### Contributing Code
-
- - Once you finalize on a project/task, please open a new JIRA or assign an
existing one to yourself.
- - Almost all PRs should be linked to a JIRA. It's always good to have a
JIRA upfront to avoid duplicating efforts.
- - If the changes are minor, then `[MINOR]` prefix can be added to Pull
Request title without a JIRA. Below are some tips to judge **MINOR** Pull
Request :
- - trivial fixes (for example, a typo, a broken link, intellisense or
an obvious error)
- - the change does not alter functionality or performance in any way
- - changed lines less than 100
- - obviously judge that the PR would pass without waiting for CI / CD
verification
- - But, you may be asked to file a JIRA, if reviewer deems it necessary
- - Before you begin work,
- - Claim the JIRA using the process above and assign the JIRA to yourself.
- - Click "Start Progress" on the JIRA, which tells everyone that you are
working on the issue actively.
- - [Optional] Familiarize yourself with internals of Hudi using content on
this page, as well as [wiki](https://cwiki.apache.org/confluence/display/HUDI)
- - Make your code change
- - Get existing tests to pass using `mvn clean install -DskipITs`
- - Add adequate tests for your new functionality
- - For involved changes, it's best to test the changes in real production
environments and report the results in the PR.
- - For website changes, please build the site locally & test navigation,
formatting & links thoroughly
- - If your code change changes some aspect of documentation (e.g new config,
default value change),
- please ensure there is another PR to [update the
docs](https://github.com/apache/hudi/tree/asf-site/README.md) as well.
- - Sending a Pull Request
- - Format commit and the pull request title like `[HUDI-XXX] Fixes bug in
Spark Datasource`,
- where you replace `HUDI-XXX` with the appropriate JIRA issue.
- - Pull request titles must have either `[HUDI-XXX]` or `[MINOR]` in their
title. Note the brackets and capitalization.
- - Please ensure your commit message body is descriptive of the change.
Bulleted summary would be appreciated.
- - You must follow the instructions in the template and fill out all fields
to pass our compliance checks.
- - Do not remove or modify any headings in the template.
- - Push your commit to your own fork/branch & create a pull request (PR)
against the Hudi repo.
- - If you don't hear back within 3 days on the PR, please send an email to
the dev @ mailing list.
- - Address code review comments & keep pushing changes to your fork/branch,
which automatically updates the PR
- - Before your change can be merged, it should be squashed into a single
commit for cleaner commit history.
- - Finally, once your pull request is merged, make sure to `Close` the JIRA.
-
-### Coding guidelines
-
-Our code can benefit from contributors speaking the same "language" when
authoring code. After all, it gets read a lot more than it gets
-written. So optimizing for "reads" is a good goal. The list below is a set of
guidelines, that contributors strive to upkeep and reflective
-of how we want to evolve our code in the future.
-
-#### Style
-
- - **Formatting** We should rely on checkstyle and spotless to auto fix
formatting; automate this completely. Where we cannot,
- we will err on the side of not taxing contributors with manual effort.
- - **Refactoring**
- - Refactor with purpose; any refactor suggested should be attributable to
functionality that now becomes easy to implement.
- - A class is asking to be refactored, when it has several overloaded
responsibilities/have sets of fields/methods which are used more cohesively
than others.
- - Try to name tests using the given-when-then model, that cleans separates
preconditions (given), an action (when), and assertions (then).
- - **Naming things**
- - Let's name uniformly; using the same word to denote the same concept.
e.g: bootstrap vs external vs source, when referring to bootstrapped tables.
- Maybe they all mean the same, but having one word makes the code lot more
easily readable.
- - Let's name consistently with Hudi terminology. e.g dataset vs table, base
file vs data file.
- - Class names preferably are nouns (e.g Runner) which reflect their
responsibility and methods are verbs (e.g run()).
- - Avoid filler words, that don't add value e.g xxxInfo, xxxData, etc.
- - We name classes in code starting with `Hoodie` and not `Hudi` and we want
to keep it that way for consistency/historical reasons.
- - **Methods**
- - Individual methods should short (~20-30 lines) and have a single purpose;
If you feel like it has a secondary purpose, then maybe it needs
- to be broken down more.
- - Lesser the number of arguments, the better;
- - Place caller methods on top of callee methods, whenever possible.
- - Avoid "output" arguments e.g passing in a list and filling its values
within the method.
- - Try to limit individual if/else blocks to few lines to aid readability.
- - Separate logical blocks of code with a newline in between e.g read a file
into memory, loop over the lines.
- - **Classes**
- - Like method, each Class should have a single purpose/responsibility.
- - Try to keep class files to about 200 lines of length, nothing beyond 500.
- - Avoid stating the obvious in comments; e.g each line does not deserve a
comment; Document corner-cases/special perf considerations etc clearly.
- - Try creating factory methods/builders and interfaces wherever you feel a
specific implementation may be changed down the line.
-
-#### Substance
-
-- Try to avoid large PRs; if unavoidable (many times they are) please separate
refactoring with the actual implementation of functionality.
- e.g renaming/breaking up a file and then changing code changes, makes the
diff very hard to review.
-- **Licensing**
- - Every source file needs to include the Apache license header. Every new
dependency needs to have
- an open source license
[compatible](https://www.apache.org/legal/resolved#criteria) with Apache.
- - If you are re-using code from another apache/open-source project,
licensing needs to be compatible and attribution added to `LICENSE` file
- - Please DO NOT copy paste any code from StackOverflow or other online
sources, since their license attribution would be unclear. Author them yourself!
-- **Code Organization**
- - Anything in `hudi-common` cannot depend on a specific engine runtime
like Spark.
- - Any changes to bundles under `packaging`, will be reviewed with
additional scrutiny to avoid breakages across versions.
-- **Code reuse**
- - Whenever you can, please use/enhance use existing utils classes in code
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes
ending in `Utils`.
- - As a complex project, that must integrate with multiple systems, we tend
to avoid dependencies like `guava`, `apache commons` for the sake of easy
integration.
- Please start a discussion on the mailing list, before attempting to
reintroduce them
- - As a data system, that takes performance seriously, we also write pieces
of infrastructure (e.g `ExternalSpillableMap`) natively, that are optimized
specifically for our scenarios.
- Please start with them first, when solving problems.
- - **Breaking changes**
- - Any version changes for dependencies, needs to be ideally vetted across
different user environments in the community, to get enough confidence before
merging.
- - Any changes to methods annotated with `PublicAPIMethod` or classes
annotated with `PublicAPIClass` require upfront discussion and potentially an
RFC.
- - Any non-backwards compatible changes similarly need upfront discussion
and the functionality needs to implement an upgrade-downgrade path.
-
-#### Tests
-
-- **Categories**
- - unit - testing basic functionality at the class level, potentially using
mocks. Expected to finish quicker
- - functional - brings up the services needed and runs test without mocking
- - integration - runs subset of functional tests, on a full fledged
enviroment with dockerized services
-- **Prepare Test Data**
- - Many unit and functional test cases require a Hudi dataset to be
prepared beforehand. `HoodieTestTable` and `HoodieWriteableTestTable` are
dedicated test utility classes for this purpose. Use them whenever appropriate,
and add new APIs to them when needed.
- - When add new APIs in the test utility classes, overload APIs with
variety of arguments to do more heavy-liftings for callers.
- - In most scenarios, you won't need to use `FileCreateUtils` directly.
- - If test cases require interaction with actual `HoodieRecord`s, use
`HoodieWriteableTestTable` (and `HoodieTestDataGenerator` probably). Otherwise,
`HoodieTestTable` that manipulates empty files shall serve the purpose.
-- **Strive for Readability**
- - Avoid writing flow controls for different assertion cases. Split to a
new test case when appropriate.
- - Use plain for-loop to avoid try-catch in lambda block. Declare
exceptions is okay.
- - Use static import for constants and static helper methods to avoid
lengthy code.
- - Avoid reusing local variable names. Create new variables generously.
- - Keep helper methods local to the test class until it becomes obviously
generic and re-useable. When that happens, move the helper method to the right
utility class. For example, `Assertions` contains common assert helpers, and
`SchemaTestUtil` is for schema related helpers.
- - Avoid putting new helpers in `HoodieTestUtils` and
`HoodieClientTestUtils`, which are named too generic. Eventually, all test
helpers shall be categorized properly.
-
-### Reviewing Code/RFCs
-
- - All pull requests would be subject to code reviews, from one or more of the
PMC/Committers.
- - Typically, each PR will get an "Assignee" based on their area of expertise,
who will work with you to land the PR.
- - Code reviews are vital, but also often time-consuming for everyone
involved. Below are some principles which could help align us better.
- - Reviewers need to provide actionable, concrete feedback that states what
needs to be done to get the PR closer to landing.
- - Reviewers need to make it explicit, which of the requested changes would
block the PR vs good-to-dos.
- - Both contributors/reviewers need to keep an open mind and ground
themselves to making the most technically sound argument.
- - If progress is hard, please involve another PMC member/Committer to share
another perspective.
- - Staying humble and eager to learn, goes a long way in ensuring these
reviews are smooth.
- - Reviewers are expected to uphold the code quality, standards outlined above.
- - When merging PRs, always make sure you are squashing the commits using the
"Squash and Merge" feature in Github
- - When necessary/appropriate, reviewers could make changes themselves to PR
branches, with the intent to get the PR landed sooner. (see
[how-to](https://cwiki.apache.org/confluence/display/HUDI/Resources#Resources-PushingChangesToPRs))
- Reviewers should seek explicit approval from author, before making large
changes to the original PR.
-
-### Suggest Changes
-
-We welcome new ideas and suggestions to improve the project, along any
dimensions - management, processes, technical vision/direction. To kick start a
discussion on the mailing thread
-to effect change and source feedback, start a new email thread with the
`[DISCUSS]` prefix and share your thoughts. If your proposal leads to a larger
change, then it may be followed up
-by a [vote](https://www.apache.org/foundation/voting) by a PMC member or
others (depending on the specific scenario).
-For technical suggestions, you can also leverage [our RFC
Process](https://cwiki.apache.org/confluence/display/HUDI/RFC+Process) to
outline your ideas in greater detail.
-
-### Useful Maven commands for developers.
+
+## Useful Maven commands for developers.
Listing out some of the maven commands that could be useful for developers.
- Compile/build entire project
@@ -326,17 +138,6 @@ You can use `alt def` command to define different
docker-compose versions. Refer
Use `alt use` to use v1 version of docker-compose while running integration
test locally.
-## Releases
-
- - Apache Hudi community plans to do minor version releases every 6 weeks or
so.
- - If your contribution merged onto the `master` branch after the last
release, it will become part of the next release.
- - Website changes are regenerated on-demand basis (until automation in place
to reflect immediately)
-
-## Communication
-
-All communication is expected to align with the [Code of
Conduct](https://www.apache.org/foundation/policies/conduct).
-Discussion about contributing code to Hudi happens on the [dev@ mailing
list](/community/get-involved). Introduce yourself!
-
## Code & Project Structure
* `docker` : Docker containers used by demo and integration tests. Brings up
a mini data ecosystem locally
@@ -357,7 +158,9 @@ This Quick Video will give a code walkthrough to start with
[watch](https://www.
## Running unit tests and local debugger via Intellij IDE
-#### IMPORTANT REMINDER FOR BELOW STEPS: When submitting a PR please make sure
to NOT commit the changes mentioned in these steps, instead once testing is
done make sure to revert the changes and then submit a pr.
+:::note Important reminder
+When submitting a PR please make sure to NOT commit the changes mentioned in
these steps, instead once testing is done make sure to revert the changes and
then submit a pr.
+:::
0. Build the project with the intended profiles via the `mvn` cli, for example
for spark 3.2 use `mvn clean package -Dspark3.2 -Dscala-2.12 -DskipTests`.
1. Install the "Maven Helper" plugin from the Intellij IDE.
diff --git a/website/contribute/how-to-contribute.md
b/website/contribute/how-to-contribute.md
index a93e48b13d35..50d01b7b280d 100644
--- a/website/contribute/how-to-contribute.md
+++ b/website/contribute/how-to-contribute.md
@@ -7,38 +7,296 @@ last_modified_at: 2020-09-01T15:59:57-04:00
Apache Hudi community welcomes contributions from anyone!
-Here are few ways, you can get involved.
+## Ways to become a contributor
- - Ask (and/or) answer questions on our support channels listed above.
- - Review code or RFCs
- - Help improve documentation
- - Author blogs on our wiki
- - Testing; Improving out-of-box experience by reporting bugs
- - Share new ideas/directions to pursue or propose a new RFC
- - Contributing code to the project: check out [newbie
JIRAs](https://issues.apache.org/jira/issues/?filter=12350891).
+A GitHub account is needed to file issues, start discussions, and send pull
requests to Hudi. Here are a few ways you can get involved.
-## Become a Committer
+ - Engage with the community on [GitHub
Discussions](https://github.com/apache/hudi/discussions) or Slack
+ - Help improve docs and contribute blogs
[here](https://github.com/apache/hudi/tree/asf-site) for hudi.apache.org
+ - Share [new
feature](https://github.com/apache/hudi/issues/new?template=hudi_feature.yml)
requests or propose a [new RFC](/contribute/rfc-process)
+ - Contribute code to the project by raising [pull requests
(PR)](https://github.com/apache/hudi/pulls) adhering to the [contribution
guide](/contribute/developer-setup). Here are some good [first
issues](https://github.com/apache/hudi/issues?q=state%3Aopen%20label%3Agood-first-issues).
+ - Report
[bugs](https://github.com/apache/hudi/issues/new?template=hudi_bug.yml) or
suggest
[improvements](https://github.com/apache/hudi/issues/new?template=hudi_improvement.yml)
to the user experience; review code or RFCs on GitHub
+ - Share your success story on [Hudi
LinkedIn](https://www.linkedin.com/company/apache-hudi/) Community Syncs.
+ - Pull requests can only be merged by a Hudi committer, listed
[here](/community/team), but anyone is free to review.
+ - [Voting on a release](https://www.apache.org/foundation/voting): Everyone
can vote on the dev mailing list. Only Hudi PMC members should mark their votes
as binding.
-We are always looking for strong contributors, who can become
[committers](https://www.apache.org/dev/committers) on the project.
-Committers are chosen by a majority vote of the Apache Hudi
[PMC](https://www.apache.org/foundation/how-it-works#pmc-members), after a
discussion on their candidacy based on the following criteria (not
exclusive/comprehensive).
+All communication is expected to align with the [Code of
Conduct](https://www.apache.org/foundation/policies/conduct).
+
+## Contributing on GitHub
+
+:::note Developer setup
+ If you are planning to contribute code, please refer to [developer
setup](/contribute/developer-setup) for instructions and information that will
+ help you get going.
+:::
+
+This document details the processes and procedures we follow to make
contributions to the project.
+If you are looking to ramp up as a contributor to the project, we highly
encourage you to read this guide in full and familiarize yourself with the
workflow.
+
+## Filing Issues
+
+Hudi manages development tasks and project/release management using GitHub
Issues, following the process and protocol below.
+
+There are five types of GitHub Issues.
+
+| Issue type | Purpose
| Who can file | Label |
+|:------------------|:--------------------------------------------------------|----------------------------------------|------------------------|
+| Epic | Roadmap tracking across multiple releases
| Only created by maintainers/committers | type:epic |
+| Feature | New feature development stories. Can have sub-issues
| Anyone | type:feature |
+| Improvements | Regular dev tasks and improvements. Can have sub-issues
| Anyone | type:devtask |
+| Bug | For issues that are fixing bugs in Hudi
| Anyone | type:bug |
+| Community Support | Problems reported that may still need triaging
| Anyone | type:community-support |
+
+When filing issues, please follow the issue templates tightly to ensure smooth
project management for everyone involved.
+
+Some things to keep in mind and strive for:
+
+- Make an attempt to find an existing issue that may solve the same issue you
are reporting.
+- Carefully gauge whether a new feature needs an
[RFC](/contribute/rfc-process).
+- If you intend to target an issue for a specific release, please mark the
release using the `milestone` field on the GitHub issue.
+- If you are not sure, please wait for a PMC/Committer to confirm/triage the
issue and accept it. This also avoids contributors spending time on issues with
unclear scope.
+- Whenever possible, break down large issues into sub-issues such that each
sub-issue can be fixed by a PR of reasonable size/complexity.
+- You can also contribute by helping others contribute. So, if you don't have
cycles to work on an issue and another contributor offers help, take it!
+
+When in doubt, you can always start a GitHub discussion so that the community
can provide early feedback and point out any similar issues, PRs, or RFCs.
+
+## Opening Pull Requests
+
+This project follows [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) specification for PR
titles to ensure consistency and enable automated tooling.
+
+:::important
+All pull requests must either reference a GitHub Issue or describe the issue
inline clearly within the pull request.
+Refer to an issue using `closes #<issue_number>` if intending to auto-close
the issue when the pull request is merged or closed,
+or simply `issue: #<issue_number>` if auto-close is not desirable.
+
+For larger features, maintainers may insist on filing GitHub Issues,
appropriately linked to other GitHub issues.
+:::
+
+### PR Title Format
+
+```
+<type>(optional scope): <description>
+```
+
+For breaking changes that require attention, use
+
+```
+<type>(optional scope)!: <description>
+```
+
+The following types are allowed.
+
+| type | Purpose
|
+|:---------|:---------------------------------------------------------------------------------------|
+| feat | new feature addition
|
+| fix | bug fix
|
+| docs | doc changes only - code or html or .md files
|
+| style | Code style, formatting or lint-only changes with 0 logic changes
|
+| refactor | Code changes that are neither fixing or adding features. cleanup,
redoing abstractions |
+| perf | Performance improvements or tooling
|
+| test | Adding, fixing tests and test infrastructure.
|
+| chore | Tooling, build system, CI/CD, or maintenance tasks
|
+
+#### Scopes
+Optionally, any of the below can be added as scope to PRs. Scopes provide
additional context and can be used to specify which part of the codebase is
affected.
+This helps us track where development activity is directed and whether bugs on
a component are being resolved in a timely fashion. Tooling should auto-apply
the
+right label to pull requests and issues.
+
+| Scope | Purpose
| Label |
+|:---------------|:--------------------------------------------------------------------------------|---------------------|
+| common | common code or abstractions shared across the entire
project | area:common |
+| core | Changes affecting transaction management, concurrency and
core read/write flows | area:core |
+| api | Any changes affecting public apis or interfaces
| area:api |
+| config | Any changes affecting public configs
| area:config |
+| storage-format | Any changes to bits on storage - timeline, index, data and
metadata | area:storage-format |
+| metadata-table | Changes around metadata table
| area:metadata-table |
+| table-services | Cleaning, Clustering, Compaction, Log Compaction, Indexing,
TTL, ... | area:table-services |
+| tools | Any tools like CLI
| area:tools |
+| ingest | Spark and Flink streamer tools, to ELT data into Hudi.
Kafka sink | area:ingest |
+| spark | Spark SQL, Streaming, Structured Streaming, Data source
| engine:spark |
+| flink | DataStream writing/reading, SQL, Dynamic Tables
| engine:flink |
+| trino | Trino Hudi connector maintained in Hudi repo
| engine:trino |
+
+
+For example:
+
+```
+feat(flink): add bucket index implementation on Flink
+fix(index): fix index update logic
+```
- - Embodies the ASF model code of
[conduct](https://www.apache.org/foundation/policies/conduct)
- - Has made significant technical contributions such as submitting PRs, filing
bugs, testing, benchmarking, authoring RFCs, providing feedback/code reviews (+
more).
- - Has helped the community over a few months, by answering questions on
support channels above and triaging issues/jiras.
- - Demonstrates clear code/design ownership of a component or code area (eg:
Delta Streamer, Hive/Presto Integration etc).
- - Brought thought leadership and new ideas into the project and evangelized
them with the community via conference talks, blog posts.
- - Great citizenship in helping with all peripheral (but very critical) work
like site maintenance, wiki/jira cleanups and so on.
- - Proven commitment to the project by way of upholding all agreed upon
processes, conventions and principles of the community.
+When broken down adequately, most issues and pull requests should address just
one primary area or scope respectively.
+But, there may be some special situations.
-## Code Contributions
+1. If your PR makes any API, core, or storage-format changes, it absolutely
must be called out.
+2. If you are unsure about the component to use either because the PR or issue
goes across them or it falls outside the list above, omit the scope in the PR
title or issue label.
-Useful resources for contributing can be found under the "Quick Links" left
menu.
-Specifically, please refer to the detailed [contribution
guide](/contribute/developer-setup).
+If your PR is targeting an old JIRA (before Hudi migrated to GitHub Issues),
put the JIRA number in the scope.
-## Accounts
+```
+feat(HUDI-1234): add a new feature
+```
+
+### Examples
+
+#### Good PR Titles ✅
+
+```
+feat: add a new index in the metadata table
+fix: resolve type handling in data processing
+docs: update installation instructions
+style: format code according to style guide
+refactor: extract common utility functions
+perf: optimize Spark query performance
+test: add unit tests for common storage layout
+chore: resolve checkstyle warnings
+improvement: enhance error handling in type cast
+blocker: fix class loading failure on Java 17
+security: update dependencies to latest versions
+```
+
+#### Bad PR Titles ❌
+
+```
+Add new feature # Missing type
+FIX: bug in login # Type should be lowercase
+feat add authentication # Missing colon
+feature: new login system # Invalid type (should be 'feat')
+fix # Missing description
+```
+
+#### Breaking Changes
+
+For breaking changes, add an exclamation mark after the type/scope:
+
+```
+feat!: change merger API to account for better delete handling
+feat(index)!: change secondary index layout
+```
+PRs with breaking changes will be subject to broader reviews and opinions
before they are merged.
+
+#### Validation
+
+PR titles are automatically validated using GitHub Actions for semantic
validation.
+
+If your PR title doesn't follow these guidelines, the validation check will
fail and you'll need to update it before merging.
+
+In rare cases, you can skip validation by adding one of these labels to your
PR:
+- `bot`
+- `ignore-semantic-pull-request`
+
+Or include `[skip ci]` in your PR title for CI-related changes.
+
+## Coding guidelines
+
+Our code can benefit from contributors speaking the same "language" when
authoring code. After all, it gets read a lot more than it gets
+written, so optimizing for "reads" is a good goal. The list below is a set of
guidelines that contributors strive to uphold and is reflective
+of how we want to evolve our code in the future.
+
+### Style
+
+- **Formatting** We should rely on checkstyle and spotless to auto-fix
formatting; automate this completely. Where we cannot,
+ we will err on the side of not taxing contributors with manual effort.
+- **Refactoring**
+ - Refactor with purpose; any refactor suggested should be attributable to
functionality that now becomes easy to implement.
+ - A class is asking to be refactored when it has several overloaded
responsibilities or has sets of fields/methods that are used more cohesively
than others.
+ - Try to name tests using the given-when-then model, which cleanly
separates preconditions (given), an action (when), and assertions (then).
+- **Naming things**
+ - Let's name uniformly by using the same word to denote the same concept.
For example: bootstrap vs external vs source, when referring to bootstrapped
tables.
+ Maybe they all mean the same, but having one word makes the code a lot
more easily readable.
+ - Let's name consistently with Hudi terminology. For example: dataset vs
table, base file vs data file.
+ - Class names preferably are nouns (e.g Runner) which reflect their
responsibility and methods are verbs (e.g run()).
+ - Avoid filler words that don't add value, for example: xxxInfo, xxxData,
etc.
+ - We name classes in code starting with `Hoodie` and not `Hudi` and we
want to keep it that way for consistency/historical reasons.
+- **Methods**
+ - Individual methods should be short (~20-30 lines) and have a single
purpose. If you feel like it has a secondary purpose, then maybe it needs
+ to be broken down more.
+ - The fewer the number of arguments, the better.
+ - Place caller methods on top of callee methods, whenever possible.
+ - Avoid "output" arguments, for example: passing in a list and filling its
values within the method.
+ - Try to limit individual if/else blocks to few lines to aid readability.
+ - Separate logical blocks of code with a newline in between, for example:
read a file into memory, loop over the lines.
+- **Classes**
+ - Like methods, each class should have a single purpose/responsibility.
+ - Try to keep class files to about 200 lines of length, nothing beyond 500.
+ - Avoid stating the obvious in comments; for example, each line does not
deserve a comment. Document corner cases/special performance considerations,
etc., clearly.
+ - Try creating factory methods/builders and interfaces wherever you feel a
specific implementation may be changed down the line.
+
+#### Substance
+
+Try to avoid large PRs; if unavoidable (many times they are), please separate
refactoring from the actual implementation of functionality.
+For example, renaming/breaking up a file and then making code changes makes
the diff very hard to review.
+- **Licensing**
+ - Every source file needs to include the Apache license header. Every new
dependency needs to have
+ an open source license
[compatible](https://www.apache.org/legal/resolved#criteria) with Apache.
+ - If you are reusing code from another Apache/open-source project,
licensing needs to be compatible and attribution added to the `LICENSE` file
+ - Please DO NOT copy-paste any code from StackOverflow or other online
sources, since their license attribution would be unclear. Author them yourself!
+- **Code Organization**
+ - Anything in `hudi-common` cannot depend on a specific engine runtime
like Spark.
+ - Any changes to bundles under `packaging`, will be reviewed with
additional scrutiny to avoid breakages across versions.
+- **Code reuse**
+ - Whenever you can, please use/enhance use existing utils classes in code
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes
ending in `Utils`.
+ - As a complex project that must integrate with multiple systems, we tend
to avoid dependencies like `guava` and `apache commons` for the sake of easy
integration.
+ Please start a discussion on the mailing list before attempting to
reintroduce them
+ - As a data system that takes performance seriously, we also write pieces
of infrastructure (e.g., `ExternalSpillableMap`) natively that are optimized
specifically for our scenarios.
+ Please start with them first when solving problems.
+- **Breaking changes**
+ - Any version changes for dependencies need to be ideally vetted across
different user environments in the community to get enough confidence before
merging.
+ - Any changes to methods annotated with `PublicAPIMethod` or classes
annotated with `PublicAPIClass` require upfront discussion and potentially an
RFC.
+ - Any non-backwards compatible changes similarly need upfront discussion
and the functionality needs to implement an upgrade-downgrade path.
+- **Documentation**
+ - Where necessary, please ensure there is another PR to [update the
docs](https://github.com/apache/hudi/tree/asf-site/README.md) as well.
+ - Keep RFCs up to date as implementation evolves.
+
+### Testing
+Add adequate tests for your new functionality. For involved changes, it's best
to test the changes in real production environments and report the results in
the PR.
+For website changes, please build the site locally & test navigation,
formatting & links thoroughly
+
+- **Categories**
+ - unit - testing basic functionality at the class level, potentially using
mocks. Expected to finish quicker
+ - functional - brings up the services needed and runs test without mocking
+ - integration - runs a subset of functional tests on a full-fledged
environment with dockerized services
+- **Prepare Test Data**
+ - Many unit and functional test cases require a Hudi dataset to be
prepared beforehand. `HoodieTestTable` and `HoodieWriteableTestTable` are
dedicated test utility classes for this purpose. Use them whenever appropriate,
and add new APIs to them when needed.
+ - When adding new APIs in the test utility classes, overload APIs with a
variety of arguments to do more heavy-lifting for callers.
+ - In most scenarios, you won't need to use `FileCreateUtils` directly.
+ - If test cases require interaction with actual `HoodieRecord`s, use
`HoodieWriteableTestTable` (and `HoodieTestDataGenerator` probably). Otherwise,
`HoodieTestTable` that manipulates empty files shall serve the purpose.
+- **Strive for Readability**
+ - Avoid writing flow controls for different assertion cases. Split to a
new test case when appropriate.
+ - Use plain for-loop to avoid try-catch in lambda block. Declare
exceptions is okay.
+ - Use static import for constants and static helper methods to avoid
lengthy code.
+ - Avoid reusing local variable names. Create new variables generously.
+ - Keep helper methods local to the test class until it becomes obviously
generic and reusable. When that happens, move the helper method to the right
utility class. For example, `Assertions` contains common assert helpers, and
`SchemaTestUtil` is for schema-related helpers.
+ - Avoid putting new helpers in `HoodieTestUtils` and
`HoodieClientTestUtils`, which are named too generic. Eventually, all test
helpers shall be categorized properly.
+
+## Reviewing Pull Requests/RFCs
+
+- All pull requests would be subject to code reviews, from one or more of the
PMC/Committers.
+- Typically, each PR will get an "Assignee" based on their area of expertise,
who will work with you to land the PR.
+- Code reviews are vital, but also often time-consuming for everyone involved.
Below are some principles which could help align us better.
+ - Reviewers need to provide actionable, concrete feedback that states what
needs to be done to get the PR closer to landing.
+ - Reviewers need to make it explicit, which of the requested changes would
block the PR vs good-to-dos.
+ - Both contributors/reviewers need to keep an open mind and ground
themselves to making the most technically sound argument.
+ - If progress is hard, please involve another PMC member/Committer to
share another perspective.
+ - Staying humble and eager to learn, goes a long way in ensuring these
reviews are smooth.
+- Reviewers are expected to uphold the code quality, standards outlined above.
+- When merging PRs, always make sure you are squashing the commits using the
"Squash and Merge" feature in GitHub
+- When necessary/appropriate, reviewers could make changes themselves to PR
branches, with the intent to get the PR landed sooner.
+ Reviewers should seek explicit approval from the author before making large
changes to the original PR.
+
+### Proposing Changes
+We welcome new ideas and suggestions to improve the project along any
dimensions - management, processes, technical vision/direction. To kick-start a
discussion on the mailing thread
+to effect change and source feedback, start a new email thread with the
`[DISCUSS]` prefix and share your thoughts. If your proposal leads to a larger
change, then it may be followed up
+by a [vote](https://www.apache.org/foundation/voting) by a PMC member or
others (depending on the specific scenario). For technical suggestions, you can
also leverage [our RFC Process](/contribute/rfc-process) to outline your ideas
in greater detail.
+
+## Becoming a Committer
+
+We are always looking for strong contributors, who can become
[committers](https://www.apache.org/dev/committers) on the project.
+Committers are chosen by a majority vote of the Apache Hudi
[PMC](https://www.apache.org/foundation/how-it-works#pmc-members), after a
discussion on their candidacy based on the following criteria (not
exclusive/comprehensive).
-It's useful to obtain few accounts to be able to effectively contribute to
Hudi.
-
- - Github account is needed to send pull requests to Hudi
- - Sign-up/in to the Apache [JIRA](https://issues.apache.org/jira). Then
please email the dev mailing list with your username, asking to be added as a
contributor to the project. This enables you to assign/be-assigned tickets and
comment on them.
- - Sign-up/in to the Apache
[cWiki](https://cwiki.apache.org/confluence/signup.action), to be able to
contribute to the wiki pages/RFCs.
+- Embodies the ASF model code of
[conduct](https://www.apache.org/foundation/policies/conduct)
+- Has made significant technical contributions such as submitting PRs, filing
bugs, testing, benchmarking, authoring RFCs, providing feedback/code reviews (+
more).
+- Has helped the community over a few months by answering questions on support
channels above and triaging issues.
+- Demonstrates clear code/design ownership of a component or code area (e.g.,
Delta Streamer, Hive/Presto Integration, etc.).
+- Brought thought leadership and new ideas into the project and evangelized
them with the community via conference talks, blog posts.
+- Great citizenship in helping with all peripheral (but very critical) work
like site maintenance, wiki cleanups, and so on.
+- Proven commitment to the project by way of upholding all agreed upon
processes, conventions and principles of the community.
\ No newline at end of file
diff --git a/website/docs/performance.md b/website/docs/performance.md
index 89bcd3ee75b0..669213137d63 100644
--- a/website/docs/performance.md
+++ b/website/docs/performance.md
@@ -76,7 +76,7 @@ significant savings on the overall compute cost.
</figure>
Hudi upserts have been stress tested upto 4TB in a single commit across the t1
table.
-See [here](https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide) for
some tuning tips.
+See [here](/docs/tuning-guide) for some tuning tips.
#### Indexing