xushiyan commented on code in PR #6902:
URL: https://github.com/apache/hudi/pull/6902#discussion_r1024330726


##########
rfc/rfc-63/rfc-63.md:
##########
@@ -0,0 +1,139 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# RFC-63: Hudi bundle standards
+
+## Proposers
+
+- @xushiyan
+
+## Approvers
+
+- @vinoth
+- @prasanna
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-3529
+
+## Abstract
+
+Hudi bundle jars are the user-facing artifacts that deserve special attention 
and careful reviews when changes happen on
+the bundle names, included jars, intended usage, etc. In this RFC, a group of 
standards about bundle building,
+dependency governance, and release are presented to serve as references for 
developers and users.
+
+## Building
+
+Bundles are built under [packaging/](../../packaging/) directory.
+
+| Bundle                      | Category  | Usage                              
                                       |
+|-----------------------------|-----------|---------------------------------------------------------------------------|
+| hudi-spark-bundle           | Engine    | Integrates with Spark              
                                       |
+| hudi-flink-bundle           | Engine    | Integrates with Flink              
                                       |
+| hudi-hadoop-mr-bundle       | Engine    | Integrates with Hive               
                                       |
+| hudi-presto-bundle          | Engine    | Integrates with Presto             
                                       |
+| hudi-trino-bundle           | Engine    | Integrates with Trino              
                                       |
+| hudi-kafka-connect-bundle   | Engine    | Integrates with Kafka Connect      
                                       |
+| hudi-aws-bundle             | Cloud     | Supports AWS-specific integration 
and features                            |
+| hudi-gcp-bundle             | Cloud     | Supports GCP-specific integration 
and features                            |
+| hudi-hive-sync-bundle       | Catalog   | Supports meta sync with Hive 
metastore                                    |
+| hudi-datahub-sync-bundle    | Catalog   | Supports meta sync with DataHub    
                                       |
+| hudi-timeline-server-bundle | Server    | Runs standalone timeline server    
                                       |
+| hudi-utilities-slim-bundle  | Utilities | Provides Hudi utilities to run 
with different engines                     |
+| hudi-utilities-bundle       | Utilities | Similar to "utilities-slim"; 
legacy bundle that contains Spark integration |
+| hudi-integ-test-bundle      | Test      | Runs with docker demo for 
integration tests                               |
+
+### Standard: engine integration
+
+An engine bundle should clearly declare which version it supports. If it's 
scala-dependent, it should also declare scala
+version in the jar name. The format is `hudi-{engine name}X.Y-bundle{_scala 
version}`. Examples:
+
+- `hudi-spark3.2-bundle_2.12`
+- `hudi-flink1.15-bundle`
+
+### Standard: cloud integration
+
+A cloud bundle should be the only artifact for the target cloud provider that 
includes all the required dependencies and
+supports that cloud's specific features. Therefore, if a Hudi module requires 
some classes from cloud-specific
+dependencies, it should include the needed Hudi cloud module as a dependency. 
For example, `hudi-utilities` module
+includes `hudi-aws` and `hudi-gcp` to provide `S3EventSource` and 
`GcsEventSource` respectively.
+
+### Standard: runnable bundle
+
+Bundle that can run as a standalone application should contain a bash script 
under the bundle directory to guide users
+how to run the application. It should show clearly what dependencies should be 
provided by user environment. An example
+is hudi-timeline-server-bundle, which provides `run_server.sh`.
+
+### Standard: bundle combinations
+
+An engine-bundle alone should be able to run with the desired engine. Cloud, 
catalog, utilities bundles are add-on
+bundles that are engine-independent and can be plugged in to the 
engine-bundle's classpath to provide additional
+support.
+
+- Example 1: users can run hudi-spark3.1-bundle_2.12 alone with Spark 3.1. 
Users can also add hudi-aws-bundle to work
+  with DynamoDB lock provider and hudi-datahub-sync-bundle to sync with 
DataHub catalog.
+- Example 2: users are running Spark DataSource writer using 
hudi-spark3.3-bundle_2.12 alone with Spark 3.3. Users add
+  hudi-utilities-slim-bundle and migrate to running `HoodieDeltaStreamer` as 
the main ingestion application.
+
+A server-bundle is supposed to run alone as a server application without other 
bundles' support.
+
+### Standard: bundle name changes
+
+Bundle names may evolve at rare cases with strong reasons. Bundles with the 
old names should still be published at each
+release to keep usage non-breaking. In case there are obvious evidence showing 
minimum negligible usage, the obsolete
+bundle names may be dropped and skipped during releases. The evidence should 
be presented in the PR for such change.
+
+## Dependency Governance
+
+### Standard: consistent shading
+
+If a dependency is deemed as common and to be shaded (a.k.a., relocation), its 
shading should be done consistently

Review Comment:
   need some guidelines for when to shade and when not to



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to