This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 7699052  [DOCS] Adding coding guidelines (#2061)
7699052 is described below

commit 769905286cb55a52278c685b4311525541ae4f4c
Author: vinoth chandar <[email protected]>
AuthorDate: Thu Sep 3 13:03:49 2020 -0700

    [DOCS] Adding coding guidelines (#2061)
---
 docs/_pages/contributing.md | 69 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 64 insertions(+), 5 deletions(-)

diff --git a/docs/_pages/contributing.md b/docs/_pages/contributing.md
index 00301ae..bb30a84 100644
--- a/docs/_pages/contributing.md
+++ b/docs/_pages/contributing.md
@@ -110,13 +110,9 @@ and more importantly also try to improve the process along 
the way as well.
       - Click "Start Progress" on the JIRA, which tells everyone that you are 
working on the issue actively.
  - [Optional] Familiarize yourself with internals of Hudi using content on 
this page, as well as [wiki](https://cwiki.apache.org/confluence/display/HUDI)
  - Make your code change
-   - Every source file needs to include the Apache license header. Every new 
dependency needs to have an 
-     open source license 
[compatible](https://www.apache.org/legal/resolved.html#criteria) with Apache.
-   - If you are re-using code from another apache/open-source project, 
licensing needs to be compatible and attribution added to `LICENSE` file
-   - Please DO NOT copy paste any code from StackOverflow or other online 
sources, since their license attribution would be unclear. Author them yourself!
    - Get existing tests to pass using `mvn clean install -DskipITs`
    - Add adequate tests for your new functionality
-   - For involved changes, it's best to also run the entire integration test 
suite using `mvn clean install`
+   - For involved changes, it's best to test the changes in real production 
environments and report the results in the PR. 
    - For website changes, please build the site locally & test navigation, 
formatting & links thoroughly
    - If your code change changes some aspect of documentation (e.g new config, 
default value change), 
      please ensure there is another PR to [update the 
docs](https://github.com/apache/hudi/tree/asf-site/README.md) as well.
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read a lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when), and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     Maybe they all mean the same, but having one word makes the code lot more 
easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file.
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData, etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then maybe it needs
+     to be broken down more.
+   - Lesser the number of arguments, the better; 
+   - Place caller methods on top of callee methods, whenever possible.
+   - Avoid "output" arguments e.g passing in a list and filling its values 
within the method.
+   - Try to limit individual if/else blocks to few lines to aid readability.
+   - Separate logical blocks of code with a newline in between e.g read a file 
into memory, loop over the lines.
+ - **Classes**
+   - Like method, each Class should have a single purpose/responsibility.
+   - Try to keep class files to about 200 lines of length, nothing beyond 500.
+   - Avoid stating the obvious in comments; e.g each line does not deserve a 
comment; Document corner-cases/special perf considerations etc clearly.
+   - Try creating factory methods/builders and interfaces wherever you feel a 
specific implementation may be changed down the line.
+
+#### Substance
+
+- Try to avoid large PRs; if unavoidable (many times they are) please separate 
refactoring with the actual implementation of functionality. 
+  e.g renaming/breaking up a file and then changing code changes, makes the 
diff very hard to review.
+- **Licensing**
+    - Every source file needs to include the Apache license header. Every new 
dependency needs to have 
+      an open source license 
[compatible](https://www.apache.org/legal/resolved.html#criteria) with Apache.
+    - If you are re-using code from another apache/open-source project, 
licensing needs to be compatible and attribution added to `LICENSE` file
+    - Please DO NOT copy paste any code from StackOverflow or other online 
sources, since their license attribution would be unclear. Author them yourself!
+- **Code Organization** 
+    - Anything in `hudi-common` cannot depend on a specific engine runtime 
like Spark. 
+    - Any changes to bundles under `packaging`, will be reviewed with 
additional scrutiny to avoid breakages across versions.
+- **Code reuse**
+  - Whenever you can, please use/enhance use existing utils classes in code 
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes 
ending in `Utils`.
+  - As a complex project, that must integrate with multiple systems, we tend 
to avoid dependencies like `guava`, `apache commons` for the sake of easy 
integration. 
+     Please start a discussion on the mailing list, before attempting to 
reintroduce them
+  - As a data system, that takes performance seriously, we also write pieces 
of infrastructure (e.g `ExternalSpillableMap`) natively, that are optimized 
specifically for our scenarios.
+     Please start with them first, when solving problems.
+ - **Breaking changes**
+   - Any version changes for dependencies, needs to be ideally vetted across 
different user environments in the community, to get enough confidence before 
merging.
+   - Any changes to methods annotated with `PublicAPIMethod` or classes 
annotated with `PublicAPIClass` require upfront discussion and potentially an 
RFC.
+   - Any non-backwards compatible changes similarly need upfront discussion 
and the functionality needs to implement an upgrade-downgrade path.
+
+
 ### Reviewing Code/RFCs
 
  - All pull requests would be subject to code reviews, from one or more of the 
PMC/Committers. 
@@ -140,6 +195,10 @@ and more importantly also try to improve the process along 
the way as well.
    - Both contributors/reviewers need to keep an open mind and ground 
themselves to making the most technically sound argument.
    - If progress is hard, please involve another PMC member/Committer to share 
another perspective.
    - Staying humble and eager to learn, goes a long way in ensuring these 
reviews are smooth.
+ - Reviewers are expected to uphold the code quality, standards outlined above.
+ - When merging PRs, always make sure you are squashing the commits using the 
"Squash and Merge" feature in Github
+ - When necessary/appropriate, reviewers could make changes themselves to PR 
branches, with the intent to get the PR landed sooner. (see 
[how-to](https://cwiki.apache.org/confluence/display/HUDI/Resources#Resources-PushingChangesToPRs))
+   Reviewers should seek explicit approval from author, before making large 
changes to the original PR.
 
 ### Suggest Changes
 

Reply via email to