rangareddy commented on issue #14872:
URL: https://github.com/apache/hudi/issues/14872#issuecomment-3665936773

   Tested with the latest Hudi Docker image; with minimal configuration 
changes, I successfully synced the Hudi table to the Hive metastore.
   
   ```sh
   docker exec -it adhoc-1 bash
   ```
   
   ```sh
   ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar 
$SPARK_HOME/jars/hadoop-aws-3.3.4.jar
   ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar 
$SPARK_HOME/jars/aws-java-sdk-bundle-1.12.262.jar
   
   ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar 
$HADOOP_HOME/share/hadoop/common/lib/
   ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar 
$HADOOP_HOME/share/hadoop/common/lib/
   ```
   
   ```sh
   vi $HADOOP_HOME/etc/hadoop/core-site.xml
   ```
   
   ```xml
   <property>
     <name>fs.s3a.access.key</name>
     <value>minio</value>
   </property>
   <property>
     <name>fs.s3a.secret.key</name>
     <value>minio123</value>
   </property>
   <property>
     <name>fs.s3a.endpoint</name>
     <value>http://minio:9090</value>
   </property>
   <property>
     <name>fs.s3a.path.style.access</name>
     <value>true</value>
   </property>
   <property>
     <name>fs.s3a.impl</name>
     <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
   </property>
   ```
   
   ```sh
   ln -s $HADOOP_HOME/etc/hadoop/core-site.xml $SPARK_INSTALL/conf/
   ```
   
   ```sh
   vi $SPARK_INSTALL/conf/spark-defaults.conf
   ```
   
   ```sh
   spark.hadoop.fs.s3a.endpoint=http://minio:9090
   spark.hadoop.fs.s3a.access.key=minio
   spark.hadoop.fs.s3a.secret.key=minio123
   spark.hadoop.fs.s3a.path.style.access=true
   spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
   ```
   
   ```sh
   $SPARK_INSTALL/bin/spark-shell \
     --jars $HUDI_SPARK_BUNDLE \
     --master local[2] \
     --driver-class-path $HADOOP_CONF_DIR \
     --conf spark.sql.hive.convertMetastoreParquet=false \
     --deploy-mode client \
     --driver-memory 1G \
     --executor-memory 3G \
     --num-executors 1 
   ```
   
   ```scala
   val tableName = "trips_table"
   val basePath = "s3a://warehouse/trips_table"
   
   val columns = Seq("ts","uuid","rider","driver","fare","city")
   
   val data = 
Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
       
(1695091554788L,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70
 ,"san_francisco"),
       
(1695046462179L,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90
 ,"san_francisco"),
       
(1695516137016L,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"),
       
(1695115999911L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));
   
   var inserts = spark.createDataFrame(data).toDF(columns:_*)
   
   
inserts.write.format("hudi").option("hoodie.datasource.write.partitionpath.field",
 "city").option("hoodie.table.name", tableName).mode("overwrite").save(basePath)
   ```
   
   ```sh
   docker exec -it hivemetastore bash
   ```
   
   ```sh
   ln -s /opt/hadoop-3.3.4/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar 
/opt/hive/lib/
   ln -s 
/opt/hadoop-3.3.4/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar 
/opt/hive/lib/
   ```
   
   `vi $HIVE_HOME/conf/hive-site.xml`
   
   ```xml
   <property>
     <name>fs.s3a.access.key</name>
     <value>minio</value>
   </property>
   <property>
     <name>fs.s3a.secret.key</name>
     <value>minio123</value>
   </property>
   <property>
     <name>fs.s3a.endpoint</name>
     <value>http://minio:9090</value>
   </property>
   <property>
     <name>fs.s3a.path.style.access</name>
     <value>true</value>
   </property>
   <property>
     <name>fs.s3a.impl</name>
     <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
   </property>
   ```
   
   After updating the above details restart the hivemetastore.
   
   ```sh
   docker exec -it hivemetastore bash
   ```
   
   ```sh
   ln -s /opt/hadoop-3.3.4/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar 
/opt/hive/lib/
   ln -s 
/opt/hadoop-3.3.4/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar 
/opt/hive/lib/
   ```
   
   `vi $HIVE_HOME/conf/hive-site.xml`
   
   ```xml
   <property>
     <name>fs.s3a.access.key</name>
     <value>minio</value>
   </property>
   <property>
     <name>fs.s3a.secret.key</name>
     <value>minio123</value>
   </property>
   <property>
     <name>fs.s3a.endpoint</name>
     <value>http://minio:9090</value>
   </property>
   <property>
     <name>fs.s3a.path.style.access</name>
     <value>true</value>
   </property>
   <property>
     <name>fs.s3a.impl</name>
     <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
   </property>
   ```
   
   After updating the above details restart the hiveserver.
   
   ```sh
   docker exec -it adhoc-1 bash
   ```
   
   ```sh
   /var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
     --jdbc-url jdbc:hive2://hiveserver:10000 \
     --user hive \
     --pass hive \
     --partitioned-by city \
     --base-path s3a://warehouse/trips_table \
     --database default \
     --table trips_table \
     --partition-value-extractor 
org.apache.hudi.hive.MultiPartKeysValueExtractor
   ```
   
   ```sh
   2025-12-17 12:56:32,020 INFO  [main] metadata.BaseTableMetadata 
(BaseTableMetadata.java:fetchAllPartitionPaths(279)) - Listed partitions from 
metadata: #partitions=3
   2025-12-17 12:56:32,473 INFO  [main] hive.HiveSyncTool 
(HiveSyncTool.java:syncHoodieTable(280)) - Sync complete for trips_table
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to