nsivabalan commented on code in PR #7819: URL: https://github.com/apache/hudi/pull/7819#discussion_r1096615481
########## website/docs/cli.md: ########## @@ -5,10 +5,22 @@ last_modified_at: 2021-08-18T15:59:57-04:00 --- ### Local set up -Once hudi has been built, the shell can be fired by via `cd hudi-cli && ./hudi-cli.sh`. A hudi table resides on DFS, in a location referred to as the `basePath` and +Once hudi has been built, the shell can be fired by via `cd hudi-cli && ./hudi-cli.sh`. + +Optionally in release `0.13.0` we have now added another way of launching the `hudi cli`, which is using the `hudi-cli-bundle`. +There are a couple of requirements when using this approach such as having `spark` installed locally on your machine. +It is required to use a spark distribution with hadoop dependencies packaged such as `spark-3.3.1-bin-hadoop2.tgz` from https://archive.apache.org/dist/spark/. +We also recommend you set an env variable `$SPARK_HOME` to the path of where spark is installed on your machine. +One important thing to note is that the `hudi-spark-bundle` should also be present when using the `hudi-cli-bundle`. Review Comment: yes. can we call that out. Main purpose of this bundle is to use it easily in EMR or dataproc or anywhere w/o the need for cloning hudi repo fully. So, we should clarify the env variables to set for such users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
