This is an automated email from the ASF dual-hosted git repository.
sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new c5ad4cf40dc [HUDI-5652] Add hudi-cli-bundle docs (#7819)
c5ad4cf40dc is described below
commit c5ad4cf40dc2bfea514070ec035b8c1aa7e14deb
Author: Rahil C <[email protected]>
AuthorDate: Wed Mar 15 17:39:32 2023 -0400
[HUDI-5652] Add hudi-cli-bundle docs (#7819)
- Add Hudi cli bundle docs
---------
Co-authored-by: Rahil Chertara <[email protected]>
---
website/docs/cli.md | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/website/docs/cli.md b/website/docs/cli.md
index dab0a7bae79..4242609a68c 100644
--- a/website/docs/cli.md
+++ b/website/docs/cli.md
@@ -5,10 +5,46 @@ last_modified_at: 2021-08-18T15:59:57-04:00
---
### Local set up
-Once hudi has been built, the shell can be fired by via `cd hudi-cli &&
./hudi-cli.sh`. A hudi table resides on DFS, in a location referred to as the
`basePath` and
+Once hudi has been built, the shell can be fired by via `cd hudi-cli &&
./hudi-cli.sh`.
+
+### Hudi CLI Bundle setup
+In release `0.13.0` we have now added another way of launching the `hudi cli`,
which is using the `hudi-cli-bundle`. (Note this is only supported for Spark3,
+for Spark2 please see the above Local setup section)
+
+There are a couple of requirements when using this approach such as having
`spark` installed locally on your machine.
+It is required to use a spark distribution with hadoop dependencies packaged
such as `spark-3.3.1-bin-hadoop2.tgz` from
https://archive.apache.org/dist/spark/.
+We also recommend you set an env variable `$SPARK_HOME` to the path of where
spark is installed on your machine.
+One important thing to note is that the `hudi-spark-bundle` should also be
present when using the `hudi-cli-bundle`.
+To provide the locations of these bundle jars you can set them in your shell
like so:
+`export CLI_BUNDLE_JAR=<path-to-cli-bundle-jar-to-use>` , `export
SPARK_BUNDLE_JAR=<path-to-spark-bundle-jar-to-use>`.
+
+For steps see below if you are not compiling the project and downloading the
jars:
+
+1. Create an empty folder as a new directory
+2. Copy the hudi-cli-bundle jars and hudi-spark*-bundle jars to this directory
+3. Copy the following script and folder to this directory
+```
+packaging/hudi-cli-bundle/hudi-cli-with-bundle.sh
+packaging/hudi-cli-bundle/conf . the `conf` folder should be in this
directory.
+```
+
+4. Start Hudi CLI shell with environment variables set
+```
+export SPARK_HOME=<spark-home-folder>
+export CLI_BUNDLE_JAR=<cli-bundle-jar-to-use>
+export SPARK_BUNDLE_JAR=<spark-bundle-jar-to-use>
+
+./hudi-cli-with-bundle.sh
+
+```
+
+### Base path
+A hudi table resides on DFS, in a location referred to as the `basePath` and
we would need this location in order to connect to a Hudi table. Hudi library
effectively manages this table internally, using `.hoodie` subfolder to track
all metadata.
+
+
### Using Hudi-cli in S3
If you are using hudi that comes packaged with AWS EMR, you can find
instructions to use hudi-cli
[here](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-cli.html).
If you are not using EMR, or would like to use latest hudi-cli from master,
you can follow the below steps to access S3 dataset in your local environment
(laptop).