jonvex commented on code in PR #9338:
URL: https://github.com/apache/hudi/pull/9338#discussion_r1282347510
##########
website/docs/migration_guide.md:
##########
@@ -56,11 +64,13 @@ spark-submit --master local \
--hoodie-conf
hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.SimpleKeyGenerator \
--hoodie-conf
hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider
\
Review Comment:
I don't think we need `hoodie-conf hoodie.bootstrap.full.input.provider` in
the example
##########
website/docs/migration_guide.md:
##########
@@ -69,12 +79,28 @@ for partition in [list of partitions in source table] {
}
```
-**Option 3**
+**Option 3 using Spark SQL CALL Procedure**
+
+Refer to [Bootstrap
procedure](https://hudi.apache.org/docs/next/procedures#bootstrap) for more
details.
+
+**Option 4 using Hudi CLI**
+
Write your own custom logic of how to load an existing table into a Hudi
managed one. Please read about the RDD API
[here](/docs/quick-start-guide). Using the bootstrap run CLI. Once hudi has
been built via `mvn clean install -DskipTests`, the shell can be
fired by via `cd hudi-cli && ./hudi-cli.sh`.
```java
hudi->bootstrap run --srcPath /tmp/source_table --targetPath
/tmp/hoodie/bootstrap_table --tableName bootstrap_table --tableType
COPY_ON_WRITE --rowKeyField ${KEY_FIELD} --partitionPathField
${PARTITION_FIELD} --sparkMaster local --hoodieConfigs
hoodie.datasource.write.hive_style_partitioning=true --selectorClass
org.apache.hudi.client.bootstrap.selector.FullRecordBootstrapModeSelector
```
-Unlike deltaStream, FULL_RECORD or METADATA_ONLY is set with --selectorClass,
see detalis with help "bootstrap run".
+Unlike Hudi Streamer, FULL_RECORD or METADATA_ONLY is set with
--selectorClass, see details with help "bootstrap run".
+
+
+## Configs
+
+Here are the basic configs that control bootstrapping.
+
+| Config Name | Default |
Description
|
+| ---------------------------------------------------- | ------------------ |
---------------------------------------------------------------------------------------------------------------------------------------
|
+| hoodie.bootstrap.base.path | N/A **(Required)** | Base path of the dataset
that needs to be bootstrapped as a Hudi table<br /><br />`Config Param:
BASE_PATH`<br />`Since Version: 0.6.0` |
+
+By default, with only `hoodie.bootstrap.base.path` being provided
METADATA_ONLY mode is selected. For other options, please refer [bootstrap
configs](https://hudi.apache.org/docs/next/configurations#Bootstrap-Configs)
for more details.
Review Comment:
I think adding `hoodie.bootstrap.mode.selector.regex.mode`,
`hoodie.bootstrap.mode.selector`, `hoodie.bootstrap.mode.selector.regex` to the
simple configs would be helpful. At a minimum at least
`hoodie.bootstrap.mode.selector` should be added
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]