lhotari commented on code in PR #21747:
URL: https://github.com/apache/pulsar/pull/21747#discussion_r1431446616


##########
pip/pip-326.md:
##########
@@ -0,0 +1,228 @@
+<!--
+RULES
+* Never place a link to an external site like Google Doc. The proposal should 
be in this issue entirely.
+* Use a spelling and grammar checker tools if available for you (there are 
plenty of free ones).
+
+PROPOSAL HEALTH CHECK
+I can read the design document and understand the problem statement and what 
you plan to change *without* resorting to a couple of hours of code reading 
just to start having a high level understanding of the change.
+
+IMAGES
+If you need diagrams, avoid attaching large files. You can use 
[MermaidJS]([url](https://mermaid.js.org/)) as a simple language to describe 
many types of diagrams.
+
+THIS COMMENTS
+Please remove them when done.
+-->
+
+# PIP-326: Create a BOM to ease dependency management
+
+# Background knowledge
+
+A `Bill of Materials` (BOM) is a special kind of POM that is used to control 
the versions of a project’s dependencies and provide a central place to define 
and update those versions.
+A BOM dependency ensure that all dependencies (both direct and transitive) are 
at the same version specified in the BOM.
+
+To illustrate, consider the [Spring Data 
BOM](https://github.com/spring-projects/spring-data-bom/blob/main/bom/pom.xml) 
which declares the version for each of the published Spring Data modules.
+Without a BOM, consuming applications must specify the version on each of the 
imported Spring Data module dependencies.
+However, when using a BOM the version numbers can be omitted.
+
+# Motivation
+
+The BOM provides the following benefits for consuming applications:
+1. Reduce burden by not having to specify the version in multiple locations
+2. Reduce chance of version mismatch (and therefore errors)
+
+The **burden** and **chance** of version mismatch is **directly** proportional 
to the number of modules published by a project.
+Pulsar publishes **many (29)** modules and therefore consuming applications 
are likely to run into the above issues.
+
+A concrete example of the above symptoms can be found in the [Spring Boot 
BOM](https://github.com/spring-projects/spring-boot/blob/92a4a1194d7a599cb57b5e5169ee5bbbfce637d8/spring-boot-project/spring-boot-dependencies/build.gradle#L1140-L1215)
 which provides a section for the list of Pulsar module dependencies as follows:
+
+```groovy
+library("Pulsar", "3.1.1") {
+    group("org.apache.pulsar") {
+        modules = [
+            "bouncy-castle-bc",
+            "bouncy-castle-bcfips",
+            "pulsar-client-1x-base",
+            "pulsar-client-1x",
+            "pulsar-client-2x-shaded",
+            "pulsar-client-admin-api",
+            "pulsar-client-admin-original",
+            "pulsar-client-admin",
+            "pulsar-client-all",
+            "pulsar-client-api",
+            "pulsar-client-auth-athenz",
+            "pulsar-client-auth-sasl",
+            "pulsar-client-messagecrypto-bc",
+            "pulsar-client-original",
+            "pulsar-client-tools-api",
+            "pulsar-client-tools",
+            "pulsar-client",
+            "pulsar-common",
+            "pulsar-config-validation",
+            "pulsar-functions-api",
+            "pulsar-functions-proto",
+            "pulsar-functions-utils",
+            "pulsar-io-aerospike",
+            "pulsar-io-alluxio",
+            "pulsar-io-aws",
+            "pulsar-io-batch-data-generator",
+            "pulsar-io-batch-discovery-triggerers",
+            "pulsar-io-canal",
+            "pulsar-io-cassandra",
+            "pulsar-io-common",
+            "pulsar-io-core",
+            "pulsar-io-data-generator",
+            "pulsar-io-debezium-core",
+            "pulsar-io-debezium-mongodb",
+            "pulsar-io-debezium-mssql",
+            "pulsar-io-debezium-mysql",
+            "pulsar-io-debezium-oracle",
+            "pulsar-io-debezium-postgres",
+            "pulsar-io-debezium",
+            "pulsar-io-dynamodb",
+            "pulsar-io-elastic-search",
+            "pulsar-io-file",
+            "pulsar-io-flume",
+            "pulsar-io-hbase",
+            "pulsar-io-hdfs2",
+            "pulsar-io-hdfs3",
+            "pulsar-io-http",
+            "pulsar-io-influxdb",
+            "pulsar-io-jdbc-clickhouse",
+            "pulsar-io-jdbc-core",
+            "pulsar-io-jdbc-mariadb",
+            "pulsar-io-jdbc-openmldb",
+            "pulsar-io-jdbc-postgres",
+            "pulsar-io-jdbc-sqlite",
+            "pulsar-io-jdbc",
+            "pulsar-io-kafka-connect-adaptor-nar",
+            "pulsar-io-kafka-connect-adaptor",
+            "pulsar-io-kafka",
+            "pulsar-io-kinesis",
+            "pulsar-io-mongo",
+            "pulsar-io-netty",
+            "pulsar-io-nsq",
+            "pulsar-io-rabbitmq",
+            "pulsar-io-redis",
+            "pulsar-io-solr",
+            "pulsar-io-twitter",
+            "pulsar-io",
+            "pulsar-metadata",
+            "pulsar-presto-connector-original",
+            "pulsar-presto-connector",
+            "pulsar-sql",
+            "pulsar-transaction-common",
+            "pulsar-websocket"
+        ]
+    }
+}
+```
+The problem with this hardcoded approach is that the Spring Boot team is not 
the expert of Pulsar and this list of modules could become stale and/or invalid 
rather easily.
+A better suitor for this specification is the Pulsar team, the 
subject-matter-experts who know exactly what is going on with Pulsar (which 
modules are available and what those version(s) should be).
+
+If there were a Pulsar BOM, the above Spring Boot dependency section would 
shrink down to the following:
+```groovy
+library("Pulsar", "3.1.1") {
+    group("org.apache.pulsar") {
+        imports = [
+            "pulsar-bom"
+        ]
+    }
+}
+```
+
+It is worth noting that This is an industry best practice and more often than 
not, a library provides a BOM. A handful of examples can be found in the 
"Links" section at the bottom of this document.
+
+# Goals
+Provide a Pulsar BOM in order to solve the issues listed in the motivation 
section.
+
+## In Scope
+The intention is to create a single BOM for all published Pulsar modules.
+The benefit goes to consumers of the project (our users) as described in the 
motivation.
+
+## Out of Scope
+This proposal is not attempting to create various BOMs that are tailored to 
specific usecases.
+
+
+# High Level Design
+1. From a build target, generate a list of published Pulsar modules
+2. From the list of modules, generate a BOM Maven POM file
+3. Publish the BOM artifact as any other Pulsar module is published
+
+# Detailed Design
+
+## Design & Implementation Details
+
+* Create new module named `pulsar-bom`
+* Create a `pom.xml.template` that has a well-known placeholder token in the 
`<dependencies>` section that will be replaced w/ the list of published module 
dependencies.
+* Create script that does the following:
+  * determines the list of published Pulsar modules
+  * iterates the list of published modules and converts each one into a Maven 
`<dependency>`
+  * concatenate all `<dependency>` entries
+  * replace the token in the template file with the concatenated dependencies
+
+
+### Determining list of published modules
+The current POC is using  [Maven Central 
Search](https://github.com/mthmulders/mcs) to generate the dependencies as 
follows:
+
+```java
+brew install mthmulders/tap/mcs
+mcs search org.apache.pulsar -l 200 | grep ":3.1.1" | sed 's/3.1.1.*/3.1.1/' | 
sort > dependencies.txt
+```
+One downside of this approach, this would not discover newly added modules 
(until they were published on Maven Central). 
+
+It would be nice if the list of modules could easily be determined from within 
the build since the build does know at some point which modules will be 
published to Maven Central.

Review Comment:
   I think it's fine to have a manual process where a new module would be added 
to the pulsar-bom's pom.xml by simply adding a new entry to the file.
   
   It's possible to run the [mvn deploy commands listed in the release 
process](https://pulsar.apache.org/contribute/release-process/#stage-maven-modules)
 by deploying to a local directory to see what's the result, 
https://maven.apache.org/plugins/maven-deploy-plugin/examples/deploy-network-issues.html#deploying-to-a-local-directory
   These commands would address that:
   ```
   mvn deploy 
-DaltDeploymentRepository=local::default::file:./target/staging-deploy 
-DskipTests
   mvn deploy 
-DaltDeploymentRepository=local::default::file:./target/staging-deploy 
-DskipTests -f tests/pom.xml -pl 
org.apache.pulsar.tests:tests-parent,org.apache.pulsar.tests:integration
   ```
   
   All default profile modules will get published unless they have 
`<skip>true</skip>` for maven-deploy-plugin in the module pom.xml's 
build/plugins
   ```
         <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-deploy-plugin</artifactId>
           <configuration>
             <skip>true</skip>
           </configuration>
         </plugin>
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to