rangareddy commented on issue #14872:
URL: https://github.com/apache/hudi/issues/14872#issuecomment-3665936773
Tested with the latest Hudi Docker image; with minimal configuration
changes, I successfully synced the Hudi table to the Hive metastore.
```sh
docker exec -it adhoc-1 bash
```
```sh
ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar
$SPARK_HOME/jars/hadoop-aws-3.3.4.jar
ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar
$SPARK_HOME/jars/aws-java-sdk-bundle-1.12.262.jar
ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar
$HADOOP_HOME/share/hadoop/common/lib/
ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar
$HADOOP_HOME/share/hadoop/common/lib/
```
```sh
vi $HADOOP_HOME/etc/hadoop/core-site.xml
```
```xml
<property>
<name>fs.s3a.access.key</name>
<value>minio</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>minio123</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>http://minio:9090</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
```
```sh
ln -s $HADOOP_HOME/etc/hadoop/core-site.xml $SPARK_INSTALL/conf/
```
```sh
vi $SPARK_INSTALL/conf/spark-defaults.conf
```
```sh
spark.hadoop.fs.s3a.endpoint=http://minio:9090
spark.hadoop.fs.s3a.access.key=minio
spark.hadoop.fs.s3a.secret.key=minio123
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
```
```sh
$SPARK_INSTALL/bin/spark-shell \
--jars $HUDI_SPARK_BUNDLE \
--master local[2] \
--driver-class-path $HADOOP_CONF_DIR \
--conf spark.sql.hive.convertMetastoreParquet=false \
--deploy-mode client \
--driver-memory 1G \
--executor-memory 3G \
--num-executors 1
```
```scala
val tableName = "trips_table"
val basePath = "s3a://warehouse/trips_table"
val columns = Seq("ts","uuid","rider","driver","fare","city")
val data =
Seq((1695159649087L,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san_francisco"),
(1695091554788L,"e96c4396-3fad-413a-a942-4cb36106d721","rider-C","driver-M",27.70
,"san_francisco"),
(1695046462179L,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-D","driver-L",33.90
,"san_francisco"),
(1695516137016L,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-F","driver-P",34.15,"sao_paulo"),
(1695115999911L,"c8abbe79-8d89-47ea-b4ce-4d224bae5bfa","rider-J","driver-T",17.85,"chennai"));
var inserts = spark.createDataFrame(data).toDF(columns:_*)
inserts.write.format("hudi").option("hoodie.datasource.write.partitionpath.field",
"city").option("hoodie.table.name", tableName).mode("overwrite").save(basePath)
```
```sh
docker exec -it hivemetastore bash
```
```sh
ln -s /opt/hadoop-3.3.4/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar
/opt/hive/lib/
ln -s
/opt/hadoop-3.3.4/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar
/opt/hive/lib/
```
`vi $HIVE_HOME/conf/hive-site.xml`
```xml
<property>
<name>fs.s3a.access.key</name>
<value>minio</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>minio123</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>http://minio:9090</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
```
After updating the above details restart the hivemetastore.
```sh
docker exec -it hivemetastore bash
```
```sh
ln -s /opt/hadoop-3.3.4/share/hadoop/tools/lib/hadoop-aws-3.3.4.jar
/opt/hive/lib/
ln -s
/opt/hadoop-3.3.4/share/hadoop/tools/lib/aws-java-sdk-bundle-1.12.262.jar
/opt/hive/lib/
```
`vi $HIVE_HOME/conf/hive-site.xml`
```xml
<property>
<name>fs.s3a.access.key</name>
<value>minio</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>minio123</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>http://minio:9090</value>
</property>
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
```
After updating the above details restart the hiveserver.
```sh
docker exec -it adhoc-1 bash
```
```sh
/var/hoodie/ws/hudi-sync/hudi-hive-sync/run_sync_tool.sh \
--jdbc-url jdbc:hive2://hiveserver:10000 \
--user hive \
--pass hive \
--partitioned-by city \
--base-path s3a://warehouse/trips_table \
--database default \
--table trips_table \
--partition-value-extractor
org.apache.hudi.hive.MultiPartKeysValueExtractor
```
```sh
2025-12-17 12:56:32,020 INFO [main] metadata.BaseTableMetadata
(BaseTableMetadata.java:fetchAllPartitionPaths(279)) - Listed partitions from
metadata: #partitions=3
2025-12-17 12:56:32,473 INFO [main] hive.HiveSyncTool
(HiveSyncTool.java:syncHoodieTable(280)) - Sync complete for trips_table
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]