PhantomHunt commented on issue #8676: URL: https://github.com/apache/hudi/issues/8676#issuecomment-1551409076
Hi, This issue is now resolved. Here are the steps we followed to fix this - 1. Downgraded Java from version 11 to version 8 (open JDK 1.8) as Hudi CLI supports only JDK1.8 2. Used this command to build packages as we are having spark 3.3 -` mvn -T 2C clean package -DskipTests -Dspark3.2 -Dscala-2.12` 3. Install these Client jars - `a. wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.2/hadoop-aws-3.2.2.jar` `b. wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.180/aws-java-sdk-bundle-1.12.180.jar` 4. Set these environment variables - ``` export SPARK_HOME=/usr/local/lib/python3.8/dist-packages/pyspark export CLIENT_JAR=/home/ubuntu/hudi-release-0.13.0/hudi-cli/aws-java-sdk-bundle-1.12.180.jar:/home/ubuntu/hudi-release-0.13.0/hudi-cli/hadoop-aws-3.2.2.jar export HOODIE_ENV_fs_DOT_s3a_DOT_impl=org.apache.hadoop.fs.s3a.S3AFileSystem export HOODIE_ENV_fs_DOT_s3a_DOT_aws_DOT_credentials_DOT_provider=com.amazonaws.auth.InstanceProfileCredentialsProvider,com.amazonaws.auth.DefaultAWSCredentialsProviderChain export HOODIE_ENV_fs_DOT_AbstractFileSystem_DOT_s3a_DOT_impl=org.apache.hadoop.fs.s3a.S3A ``` **Suggestion:** 1. @ad1happy2go and team, We would suggest enhancing the HUDI CLI documentation to provide more clarity on the installation process for future users. Also, there is a version mismatch issue [here](https://hudi.apache.org/docs/cli/#using-hudi-cli-in-s3)  which can cause errors. 2. Kindly provide CLI installation steps for spark 3.3 in [this](https://www.onehouse.ai/blog/getting-started-manage-your-hudi-tables-with-the-admin-hudi-cli-tool) OneHouse documentation. 3. The Environment variables we had to set for connecting CLI to S3/S3a were not there in any document. Request to add those as well Thanks @ad1happy2go for your all help and time. Really appreciate the dedication you showed to help us resolve it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
