rangareddy commented on issue #12099: URL: https://github.com/apache/hudi/issues/12099#issuecomment-2416407129
Hi @melin I have re-tested with Spark 3.5.3 and Hudi 0.15, and I did not encounter any issues. I am sharing a sample pom.xml file and Scala code for your reference. Please test and let me know the results. `pom.xml` ```xml <?xml version="1.0" encoding="UTF-8"?> <project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <artifactId>Spark_Hudi_100</artifactId> <version>1.0.0</version> <groupId>com.ranga</groupId> <properties> <java.version>8</java.version> <maven.compiler.source>${java.version}</maven.compiler.source> <maven.compiler.target>${java.version}</maven.compiler.target> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <spark.version>3.5.3</spark.version> <spark.major.version>3.5</spark.major.version> <scala.version>2.12.18</scala.version> <scala.binary.version>2.12</scala.binary.version> <hudi.version>0.15.0</hudi.version> <maven-shade-plugin.version>3.5.0</maven-shade-plugin.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_${scala.binary.version}</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-spark${spark.major.version}-bundle_${scala.binary.version}</artifactId> <version>${hudi.version}</version> </dependency> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-hadoop-mr-bundle</artifactId> <version>${hudi.version}</version> </dependency> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>${maven-shade-plugin.version}</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <minimizeJar>false</minimizeJar> <createDependencyReducedPom>false</createDependencyReducedPom> <artifactSet> <includes> <!-- Include here the dependencies you want to be packed in your fat jar --> <include>*:*</include> </includes> </artifactSet> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>module-info.class</exclude> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> <exclude>META-INF/*.MF</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build> </project> ``` `Test12099.scala` ```scala package com.ranga import org.apache.spark.SparkConf import org.apache.spark.sql.types._ import org.apache.spark.sql.{Row, SparkSession} object Test12099 extends App { val name = this.getClass.getSimpleName.replace("$", "") val sparkConf = new SparkConf().setAppName(name).setIfMissing("spark.master", "local[2]") val spark = SparkSession.builder.appName(name).config(sparkConf) .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .config("spark.sql.extensions", "org.apache.spark.sql.hudi.HoodieSparkSessionExtension") .config("spark.sql.hive.convertMetastoreParquet", "false") .getOrCreate() val tableName = name spark.sql( f""" |CREATE TABLE IF NOT EXISTS ${tableName} ( | `id` VARCHAR(20), | `name` VARCHAR(10), | `age` INT, | `ts` Long |) USING HUDI TBLPROPERTIES (primaryKey = 'id,name', preCombineField = 'ts') | LOCATION '/tmp/warehouse/t_test' """.stripMargin) val input_schema = StructType(Seq( StructField("id", LongType), StructField("name", StringType), StructField("age", IntegerType), StructField("ts", LongType), )) val input_data = Seq( Row(1L, "hello", 42, 1695159649087L), Row(2L, "world", 13, 1695091554788L), Row(3L, "spark", 7, 1695115999911L), Row(1L, "hello", 43, 1695159649087L), ) val basePath = f"file:///tmp/$tableName" val hoodieConf = scala.collection.mutable.Map[String, String]() hoodieConf.put("hoodie.datasource.write.recordkey.field", "id,age") hoodieConf.put("hoodie.table.precombine.field", "ts") hoodieConf.put("hoodie.table.name", tableName) val input_df = spark.createDataFrame(spark.sparkContext.parallelize(input_data), input_schema) input_df.write.format("hudi"). options(hoodieConf). mode("append"). save(basePath) spark.read.format("hudi").load(basePath).show(false) println("Displaying the tables") spark.sql("SHOW tables").show(truncate = false) println("Drop the table") spark.sql(f"DROP TABLE ${tableName}") println("Displaying the tables") spark.sql("SHOW tables").show(truncate = false) spark.stop() } ``` **Output:** ```sh Displaying the tables 24/10/16 15:52:28 INFO CodeGenerator: Code generated in 60.211 ms 24/10/16 15:52:28 INFO CodeGenerator: Code generated in 31.001125 ms 24/10/16 15:52:28 INFO CodeGenerator: Code generated in 18.569542 ms +---------+---------+-----------+ |namespace|tableName|isTemporary| +---------+---------+-----------+ |default |test12099|false | +---------+---------+-----------+ Drop the table Displaying the tables +---------+---------+-----------+ |namespace|tableName|isTemporary| +---------+---------+-----------+ +---------+---------+-----------+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
