yanghua commented on a change in pull request #2061:
URL: https://github.com/apache/hudi/pull/2061#discussion_r481809731



##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets

Review comment:
       `lot` -> `a lot`

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).

Review comment:
       `an action (when) and ` -> `an action (when), and `?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.

Review comment:
       `reflective of` -> `reflect on`?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 

Review comment:
       `May be` -> `Maybe`?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.

Review comment:
       `xxxData etc` -> `xxxData, etc`?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..

Review comment:
       `file..` -> `file.`

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then may be it needs
+     to be broken down more.
+   - Lesser the number of arguments, the better; 
+   - Place caller methods on top of callee methods, whenever possible.
+   - Avoid "output" arguments e.g passing in a list and filling its values 
within the method.
+   - Try to limit individual if/else blocks to few lines to aid readability.
+   - Separate logical blocks of code with a newline in between e.g read a file 
into memory, loop over the lines.
+ - **Classes**
+   - Like method, each Class should have a single purpose/responsibility.
+   - Try to keep class files to about 200 lines of length, nothing beyond 500.
+   - Avoid stating the obvious in comments; e.g each line does not deserve a 
comment; Document corner-cases/special perf considerations etc clearly.
+   - Try creating factory methods/builders and interfaces wherever you feel a 
specific implementation may be changed down the line.
+
+#### Substance
+
+- Try to avoid large PRs; if unavoidable (many times they are) please separate 
refactoring with actual implementation of functionality. 

Review comment:
       `actual` -> `the actual`?

##########
File path: docs/_pages/contributing.md
##########
@@ -140,6 +195,9 @@ and more importantly also try to improve the process along 
the way as well.
    - Both contributors/reviewers need to keep an open mind and ground 
themselves to making the most technically sound argument.
    - If progress is hard, please involve another PMC member/Committer to share 
another perspective.
    - Staying humble and eager to learn, goes a long way in ensuring these 
reviews are smooth.
+ - Reviewers are expected to uphold the code quality, standards outlined above.
+ - When merging PRs, always make sure you are squashing the commits using the 
"Squash and Merge" feature in Github
+ - When necessary/appropriate, reviewers could make changes themselves to PR 
branches, with intent to get the PR landed sooner. (see 
[how-to](https://cwiki.apache.org/confluence/display/HUDI/Resources#Resources-PushingChangesToPRs))

Review comment:
       `with intent` -> `with the intent`?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then may be it needs
+     to be broken down more.
+   - Lesser the number of arguments, the better; 
+   - Place caller methods on top of callee methods, whenever possible.
+   - Avoid "output" arguments e.g passing in a list and filling its values 
within the method.
+   - Try to limit individual if/else blocks to few lines to aid readability.
+   - Separate logical blocks of code with a newline in between e.g read a file 
into memory, loop over the lines.
+ - **Classes**
+   - Like method, each Class should have a single purpose/responsibility.
+   - Try to keep class files to about 200 lines of length, nothing beyond 500.
+   - Avoid stating the obvious in comments; e.g each line does not deserve a 
comment; Document corner-cases/special perf considerations etc clearly.
+   - Try creating factory methods/builders and interfaces wherever you feel a 
specific implementation may be changed down the line.
+
+#### Substance
+
+- Try to avoid large PRs; if unavoidable (many times they are) please separate 
refactoring with actual implementation of functionality. 
+  e.g renaming/breaking up a file and then changing code changes, makes the 
diff very hard to review.
+- **Licensing**
+    - Every source file needs to include the Apache license header. Every new 
dependency needs to have 
+      an open source license 
[compatible](https://www.apache.org/legal/resolved.html#criteria) with Apache.
+    - If you are re-using code from another apache/open-source project, 
licensing needs to be compatible and attribution added to `LICENSE` file
+    - Please DO NOT copy paste any code from StackOverflow or other online 
sources, since their license attribution would be unclear. Author them yourself!
+- **Code Organization** 
+    - Anything in `hudi-common` cannot depend on a specific engine runtime 
like Spark. 
+    - Any changes to bundles under `packaging`, will be reviewed with 
additional scrutiny to avoid breakages across versions.
+- **Code reuse**
+  - Whenever you can, please use/enhance use existing utils classes in code 
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes 
ending in `Utils`.
+  - As a complex project, that must integrate with multiple systems, we tend 
to avoid dependencies like `guava`, `apache commons` for sake of easy 
integration. 
+     Please start a discussion on the mailing list, before attempting to 
reintroduce them
+  - As a data system, that takes performance seriously, we also write piece of 
infrastructure (e.g `ExternalSpillableMap`) natively, that are optimized 
specifically for our scenarios.
+     Please start with them first, when solving problems.
+ - **Breaking changes**
+   - Any version changes for dependencies, needs to be ideally vetted across 
different user environments in the community, to get enough confidence before 
merging.
+   - Any changes to methods annotated with `PublicAPIMethod` or classes 
annotated with `PublicAPIClass` require upfront discussion and potentially a 
RFC.
+   - Any non-backwards compatible changes similarly need upfront discussion 
and the functionality needs to implement an upgrade-downgrade path.

Review comment:
       `non-backwards` -> `non-backward`?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then may be it needs
+     to be broken down more.
+   - Lesser the number of arguments, the better; 
+   - Place caller methods on top of callee methods, whenever possible.
+   - Avoid "output" arguments e.g passing in a list and filling its values 
within the method.
+   - Try to limit individual if/else blocks to few lines to aid readability.
+   - Separate logical blocks of code with a newline in between e.g read a file 
into memory, loop over the lines.
+ - **Classes**
+   - Like method, each Class should have a single purpose/responsibility.
+   - Try to keep class files to about 200 lines of length, nothing beyond 500.
+   - Avoid stating the obvious in comments; e.g each line does not deserve a 
comment; Document corner-cases/special perf considerations etc clearly.
+   - Try creating factory methods/builders and interfaces wherever you feel a 
specific implementation may be changed down the line.
+
+#### Substance
+
+- Try to avoid large PRs; if unavoidable (many times they are) please separate 
refactoring with actual implementation of functionality. 
+  e.g renaming/breaking up a file and then changing code changes, makes the 
diff very hard to review.
+- **Licensing**
+    - Every source file needs to include the Apache license header. Every new 
dependency needs to have 
+      an open source license 
[compatible](https://www.apache.org/legal/resolved.html#criteria) with Apache.
+    - If you are re-using code from another apache/open-source project, 
licensing needs to be compatible and attribution added to `LICENSE` file
+    - Please DO NOT copy paste any code from StackOverflow or other online 
sources, since their license attribution would be unclear. Author them yourself!
+- **Code Organization** 
+    - Anything in `hudi-common` cannot depend on a specific engine runtime 
like Spark. 
+    - Any changes to bundles under `packaging`, will be reviewed with 
additional scrutiny to avoid breakages across versions.
+- **Code reuse**
+  - Whenever you can, please use/enhance use existing utils classes in code 
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes 
ending in `Utils`.
+  - As a complex project, that must integrate with multiple systems, we tend 
to avoid dependencies like `guava`, `apache commons` for sake of easy 
integration. 
+     Please start a discussion on the mailing list, before attempting to 
reintroduce them
+  - As a data system, that takes performance seriously, we also write piece of 
infrastructure (e.g `ExternalSpillableMap`) natively, that are optimized 
specifically for our scenarios.
+     Please start with them first, when solving problems.
+ - **Breaking changes**
+   - Any version changes for dependencies, needs to be ideally vetted across 
different user environments in the community, to get enough confidence before 
merging.
+   - Any changes to methods annotated with `PublicAPIMethod` or classes 
annotated with `PublicAPIClass` require upfront discussion and potentially a 
RFC.

Review comment:
       `a RFC` -> `an RFC`

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then may be it needs

Review comment:
       `may be` -> `maybe`?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then may be it needs
+     to be broken down more.
+   - Lesser the number of arguments, the better; 
+   - Place caller methods on top of callee methods, whenever possible.
+   - Avoid "output" arguments e.g passing in a list and filling its values 
within the method.
+   - Try to limit individual if/else blocks to few lines to aid readability.
+   - Separate logical blocks of code with a newline in between e.g read a file 
into memory, loop over the lines.
+ - **Classes**
+   - Like method, each Class should have a single purpose/responsibility.
+   - Try to keep class files to about 200 lines of length, nothing beyond 500.
+   - Avoid stating the obvious in comments; e.g each line does not deserve a 
comment; Document corner-cases/special perf considerations etc clearly.
+   - Try creating factory methods/builders and interfaces wherever you feel a 
specific implementation may be changed down the line.
+
+#### Substance
+
+- Try to avoid large PRs; if unavoidable (many times they are) please separate 
refactoring with actual implementation of functionality. 
+  e.g renaming/breaking up a file and then changing code changes, makes the 
diff very hard to review.
+- **Licensing**
+    - Every source file needs to include the Apache license header. Every new 
dependency needs to have 
+      an open source license 
[compatible](https://www.apache.org/legal/resolved.html#criteria) with Apache.
+    - If you are re-using code from another apache/open-source project, 
licensing needs to be compatible and attribution added to `LICENSE` file
+    - Please DO NOT copy paste any code from StackOverflow or other online 
sources, since their license attribution would be unclear. Author them yourself!
+- **Code Organization** 
+    - Anything in `hudi-common` cannot depend on a specific engine runtime 
like Spark. 
+    - Any changes to bundles under `packaging`, will be reviewed with 
additional scrutiny to avoid breakages across versions.
+- **Code reuse**
+  - Whenever you can, please use/enhance use existing utils classes in code 
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes 
ending in `Utils`.
+  - As a complex project, that must integrate with multiple systems, we tend 
to avoid dependencies like `guava`, `apache commons` for sake of easy 
integration. 

Review comment:
       for `the` sake?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then may be it needs
+     to be broken down more.
+   - Lesser the number of arguments, the better; 
+   - Place caller methods on top of callee methods, whenever possible.
+   - Avoid "output" arguments e.g passing in a list and filling its values 
within the method.
+   - Try to limit individual if/else blocks to few lines to aid readability.
+   - Separate logical blocks of code with a newline in between e.g read a file 
into memory, loop over the lines.
+ - **Classes**
+   - Like method, each Class should have a single purpose/responsibility.
+   - Try to keep class files to about 200 lines of length, nothing beyond 500.
+   - Avoid stating the obvious in comments; e.g each line does not deserve a 
comment; Document corner-cases/special perf considerations etc clearly.
+   - Try creating factory methods/builders and interfaces wherever you feel a 
specific implementation may be changed down the line.
+
+#### Substance
+
+- Try to avoid large PRs; if unavoidable (many times they are) please separate 
refactoring with actual implementation of functionality. 
+  e.g renaming/breaking up a file and then changing code changes, makes the 
diff very hard to review.
+- **Licensing**
+    - Every source file needs to include the Apache license header. Every new 
dependency needs to have 
+      an open source license 
[compatible](https://www.apache.org/legal/resolved.html#criteria) with Apache.
+    - If you are re-using code from another apache/open-source project, 
licensing needs to be compatible and attribution added to `LICENSE` file
+    - Please DO NOT copy paste any code from StackOverflow or other online 
sources, since their license attribution would be unclear. Author them yourself!
+- **Code Organization** 
+    - Anything in `hudi-common` cannot depend on a specific engine runtime 
like Spark. 
+    - Any changes to bundles under `packaging`, will be reviewed with 
additional scrutiny to avoid breakages across versions.
+- **Code reuse**
+  - Whenever you can, please use/enhance use existing utils classes in code 
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes 
ending in `Utils`.
+  - As a complex project, that must integrate with multiple systems, we tend 
to avoid dependencies like `guava`, `apache commons` for sake of easy 
integration. 
+     Please start a discussion on the mailing list, before attempting to 
reintroduce them
+  - As a data system, that takes performance seriously, we also write piece of 
infrastructure (e.g `ExternalSpillableMap`) natively, that are optimized 
specifically for our scenarios.

Review comment:
       write `a` piece of?

##########
File path: docs/_pages/contributing.md
##########
@@ -130,6 +126,65 @@ and more importantly also try to improve the process along 
the way as well.
    - Before your change can be merged, it should be squashed into a single 
commit for cleaner commit history.
  - Finally, once your pull request is merged, make sure to `Close` the JIRA.
 
+### Coding guidelines 
+
+Our code can benefit from contributors speaking the same "language" when 
authoring code. After all, it gets read lot more than it gets
+written. So optimizing for "reads" is a good goal. The list below is a set of 
guidelines, that contributors strive to upkeep and reflective 
+of how we want to evolve our code in the future.
+
+#### Style 
+
+ - **Formatting** We should rely on checkstyle and spotless to auto fix 
formatting; automate this completely. Where we cannot,
+    we will err on the side of not taxing contributors with manual effort.
+ - **Refactoring**
+   - Refactor with purpose; any refactor suggested should be attributable to 
functionality that now becomes easy to implement.
+   - A class is asking to be refactored, when it has several overloaded 
responsibilities/have sets of fields/methods which are used more cohesively 
than others. 
+   - Try to name tests using the given-when-then model, that cleans separates 
preconditions (given), an action (when) and assertions (then).
+ - **Naming things**
+   - Let's name uniformly; using the same word to denote the same concept. 
e.g: bootstrap vs external vs source, when referring to bootstrapped tables. 
+     May be they all mean the same, but having one word makes the code lot 
more easily readable. 
+   - Let's name consistently with Hudi terminology. e.g dataset vs table, base 
file vs data file..
+   - Class names preferably are nouns (e.g Runner) which reflect their 
responsibility and methods are verbs (e.g run()).
+   - Avoid filler words, that don't add value e.g xxxInfo, xxxData etc.
+   - We name classes in code starting with `Hoodie` and not `Hudi` and we want 
to keep it that way for consistency/historical reasons. 
+ - **Methods**
+   - Individual methods should short (~20-30 lines) and have a single purpose; 
If you feel like it has a secondary purpose, then may be it needs
+     to be broken down more.
+   - Lesser the number of arguments, the better; 
+   - Place caller methods on top of callee methods, whenever possible.
+   - Avoid "output" arguments e.g passing in a list and filling its values 
within the method.
+   - Try to limit individual if/else blocks to few lines to aid readability.
+   - Separate logical blocks of code with a newline in between e.g read a file 
into memory, loop over the lines.
+ - **Classes**
+   - Like method, each Class should have a single purpose/responsibility.
+   - Try to keep class files to about 200 lines of length, nothing beyond 500.
+   - Avoid stating the obvious in comments; e.g each line does not deserve a 
comment; Document corner-cases/special perf considerations etc clearly.
+   - Try creating factory methods/builders and interfaces wherever you feel a 
specific implementation may be changed down the line.
+
+#### Substance
+
+- Try to avoid large PRs; if unavoidable (many times they are) please separate 
refactoring with actual implementation of functionality. 
+  e.g renaming/breaking up a file and then changing code changes, makes the 
diff very hard to review.
+- **Licensing**
+    - Every source file needs to include the Apache license header. Every new 
dependency needs to have 
+      an open source license 
[compatible](https://www.apache.org/legal/resolved.html#criteria) with Apache.
+    - If you are re-using code from another apache/open-source project, 
licensing needs to be compatible and attribution added to `LICENSE` file
+    - Please DO NOT copy paste any code from StackOverflow or other online 
sources, since their license attribution would be unclear. Author them yourself!
+- **Code Organization** 
+    - Anything in `hudi-common` cannot depend on a specific engine runtime 
like Spark. 
+    - Any changes to bundles under `packaging`, will be reviewed with 
additional scrutiny to avoid breakages across versions.
+- **Code reuse**
+  - Whenever you can, please use/enhance use existing utils classes in code 
(`CollectionUtils`, `ParquetUtils`, `HoodieAvroUtils`). Search for classes 
ending in `Utils`.
+  - As a complex project, that must integrate with multiple systems, we tend 
to avoid dependencies like `guava`, `apache commons` for sake of easy 
integration. 

Review comment:
       for `the` sake?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to