[GitHub] [hudi] nsivabalan commented on a diff in pull request #7819: [HUDI-5652] Add hudi-cli-bundle docs

via GitHub Sat, 04 Feb 2023 19:07:49 -0800


nsivabalan commented on code in PR #7819:
URL: https://github.com/apache/hudi/pull/7819#discussion_r1096615481



##########
website/docs/cli.md:
##########
@@ -5,10 +5,22 @@ last_modified_at: 2021-08-18T15:59:57-04:00
 ---
 
 ### Local set up
-Once hudi has been built, the shell can be fired by via  `cd hudi-cli && 
./hudi-cli.sh`. A hudi table resides on DFS, in a location referred to as the 
`basePath` and
+Once hudi has been built, the shell can be fired by via  `cd hudi-cli && 
./hudi-cli.sh`.
+
+Optionally in release `0.13.0` we have now added another way of launching the 
`hudi cli`, which is using the `hudi-cli-bundle`.
+There are a couple of requirements when using this approach such as having 
`spark` installed locally on your machine. 
+It is required to use a spark distribution with hadoop dependencies packaged 
such as `spark-3.3.1-bin-hadoop2.tgz` from 
https://archive.apache.org/dist/spark/.
+We also recommend you set an env variable `$SPARK_HOME` to the path of where 
spark is installed on your machine. 
+One important thing to note is that the `hudi-spark-bundle` should also be 
present when using the `hudi-cli-bundle`.  

Review Comment:
   yes. can we call that out. Main purpose of this bundle is to use it easily 
in EMR or dataproc or anywhere w/o the need for cloning hudi repo fully. So, we 
should clarify the env variables to set for such users. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7819: [HUDI-5652] Add hudi-cli-bundle docs

Reply via email to